Quantcast

Increasing performance of query

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Increasing performance of query

Gabriele Lanaro
Dear SQLite communiy,

I’m trying to increase the performance of a query to its maximum possible
speed.

The schema is roughly constituted of two tables, one is a full text index,
fts5_table, while the other is called datatable.

The query in question is a full text query on the full text index joined
with the other table.

*SELECT* fts5.column, datatable.column2 *FROM* fts5_table, datatable *WHERE*
fts5_table *MATCH* ‘term’ *AND* datatable.id = fts5_column.rowid *ORDER BY*
rank *ASC* *LIMIT* 10;

Another query looks for counts of the same kind:

*SELECT* count(datatable.id) *FROM* fts5_table, datatable *WHERE*
fts5_table *MATCH* ‘term’ *AND* datatable.id = fts5_column.rowid

On a database of about the size of 1 GB of text (evenly distributed amount
100 000 rows). To give an idea of the speeds obtained I achieve a speed of
about 40 ms for the first and 80 ms for the second query.

Things I tried:

1) ANALYZE

Decreased the speed by orders of magnitude.

2) pragma cache_size=-256000;

Improved the performance considerably, increasing further than this value
has no effect

3) pragma mmap_size

I didn’t see an improvement

4) putting the database in a ramdisk

No improvement

I was wondering which other strategies could be effecting in increasing the
query speed.

I also noticed that the *ORDER BY* statement in the first query slows down
the query quite a bit (and this is probably due to calculating the bm25
score on all the results to get a ranking) I was wondering if there’s other
ways to prevent that.

Thanks,

Gabriele
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Increasing performance of query

Simon Slavin-3

On 19 Apr 2017, at 7:50pm, Gabriele Lanaro <[hidden email]> wrote:

> *SELECT* count(datatable.id) *FROM* fts5_table, datatable *WHERE*
> fts5_table *MATCH* ‘term’ *AND* datatable.id = fts5_column.rowid

If datatable.id is never NULL, then "count(*)" should do the same thing and be faster.

> 1) ANALYZE
>
> Decreased the speed by orders of magnitude.


That’s never supposed to happen.  Just to make sure I understand this … Are you saying that you have a situation where ANALYZE made something slower ?  If so, have you kept a copy of the database before you executed ANALYZE ?  There are some commands which will reveal what happened.

Given the efforts you’ve been to with other commands, you might want to try VACUUM.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Increasing performance of query

R Smith
In reply to this post by Gabriele Lanaro

On 2017/04/19 8:50 PM, Gabriele Lanaro wrote:

> Dear SQLite communiy,
>
> I’m trying to increase the performance of a query to its maximum possible
> speed.
>
> The schema is roughly constituted of two tables, one is a full text index,
> fts5_table, while the other is called datatable.
>
> The query in question is a full text query on the full text index joined
> with the other table.
>
> *SELECT* fts5.column, datatable.column2 *FROM* fts5_table, datatable *WHERE*
> fts5_table *MATCH* ‘term’ *AND* datatable.id = fts5_column.rowid *ORDER BY*
> rank *ASC* *LIMIT* 10;
>
> Another query looks for counts of the same kind:
>
> *SELECT* count(datatable.id) *FROM* fts5_table, datatable *WHERE*
> fts5_table *MATCH* ‘term’ *AND* datatable.id = fts5_column.rowid
>
> On a database of about the size of 1 GB of text (evenly distributed amount
> 100 000 rows). To give an idea of the speeds obtained I achieve a speed of
> about 40 ms for the first and 80 ms for the second query.

Do you really mean "ms" as in milli-seconds? getting a result on an
aggregated or sorted query in under a 100 milliseconds from a Gigabyte+
table with 100K+ rows is what we would call: Very Fast.

>
> Things I tried:
>
> 1) ANALYZE
>
> Decreased the speed by orders of magnitude.

Do you really mean "Decreased the speed" as in made it much much slower?

I'm going to assume for the sake of brevity that you meant "seconds" and
"Increased" so that there is real opportunity for improvement.

One strategy I would employ, if you always do those two queries
together, is to query to a temp table like this:

CREATE TEMP TABLE IF NOT EXISTS tmp_aq(
   id INTEGER PRIMARY KEY,
   col1 TEXT,
   iRank INT
);

-- In case there's already values:
DELETE * FROM tmp_aq;

INSERT INTO tmp_aq(id, col1, iRank)
   SELECT rowid, column, rank
     FROM fts5_table
    WHERE fts5_table MATCH 'term';

-- The first query then becomes:
SELECT col1, datatable.column2
   FROM tmp_aq
   JOIN datatable ON datatable.id = tmp_aq.id
  ORDER BY tmp_aq.iRank ASC
  LIMIT 10;


-- If datatable has a 1-to-1 relationship with the fts table,
-- then this is enough for the second query:
SELECT count(*) FROM tmp_aq;

-- else you will still need the join:
SELECT count(*) FROM tmp_aq JOIN datatable ON datatable.id = tmp_aq.id;


An Index on iRank may or may not improve things - you need to test it.


Cheers,
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Increasing performance of query

Gabriele Lanaro
In reply to this post by Gabriele Lanaro
Thanks for your answers!

First of all, my apologies with ANALYZE I meant that it increased
performance (decreased query time) by orders of magnitude.

Before ANALYZE the query was in the order of seconds (about 2 seconds).
After ANALYZE the query was in the order of milliseconds.

I tried VACUUM too, but didn't improve query time.

I'm now trying saving the result to a temporary table (I set
temp_store=memory to avoid writing on disk), it looks like, by saving only
the rank, the query takes about 50 ms, which is a very good time..

And then extracting the first 15 records ordered by rank becomes very fast
thanks to an index. This is already very useful because it may happen that
we query the same term over and over to scroll down the result by rank.

Any other tips are always appreciated.

Thanks again!

Gabriele
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...