50% faster than 3.7.17

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

50% faster than 3.7.17

Richard Hipp-3
The latest SQLite 3.8.7 alpha version (available on the download page
http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
from 16 months ago.  That is to say, it does 50% more work using the same
number of CPU cycles.

This performance gain is over and above the query planner improvements that
have also been made.  We are constantly looking for new ways to run queries
and adding those ways into the query planner.  For example, in the previous
release, we added a new way to evaluate IN operators with non-constant
right-hand-sides that was reported on this mailing list to make some
queries run 5 times faster.

The 50% faster number above is not about better query plans.  This is 50%
faster at the low-level grunt work of moving bits on and off disk and
search b-trees.  We have achieved this by incorporating hundreds of
micro-optimizations.  Each micro-optimization might improve the performance
by as little as 0.05%.  If we get one that improves performance by 0.25%,
that is considered a huge win.  Each of these optimizations is unmeasurable
on a real-world system (we have to use cachegrind to get repeatable
run-times) but if you do enough of them, they add up.

A full 10% of the performance gain has come since the previous release.
There have been a lot of changes.  All our tests pass, and we still have
100% branch test coverage, so we are confident that we didn't break too
much.  But your testing is an important part of our quality process.
Please download a source archive or a DLL and give the latest alpha a
whirl, and let us know if you encounter any problems.

P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
cycles, as measured by cachegrind.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

jose isaias cabrera
"Richard Hipp" wrote...


> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.
>
> This performance gain is over and above the query planner improvements
> that
> have also been made.  We are constantly looking for new ways to run
> queries
> and adding those ways into the query planner.  For example, in the
> previous
> release, we added a new way to evaluate IN operators with non-constant
> right-hand-sides that was reported on this mailing list to make some
> queries run 5 times faster.
>
> The 50% faster number above is not about better query plans.  This is 50%
> faster at the low-level grunt work of moving bits on and off disk and
> search b-trees.  We have achieved this by incorporating hundreds of
> micro-optimizations.  Each micro-optimization might improve the
> performance
> by as little as 0.05%.  If we get one that improves performance by 0.25%,
> that is considered a huge win.  Each of these optimizations is
> unmeasurable
> on a real-world system (we have to use cachegrind to get repeatable
> run-times) but if you do enough of them, they add up.
>
> A full 10% of the performance gain has come since the previous release.
> There have been a lot of changes.  All our tests pass, and we still have
> 100% branch test coverage, so we are confident that we didn't break too
> much.  But your testing is an important part of our quality process.
> Please download a source archive or a DLL and give the latest alpha a
> whirl, and let us know if you encounter any problems.
>
> P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
> Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
> 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
> cycles, as measured by cachegrind.

I don't know if folks have ever thank you, Dr. Hipp, for this wonderful gift
to the world called SQLite.  I have become a legend in my own world with
this tool. :-)  I do have to say that I have used it since 2006 and it has
increased in speed every year.  Thank you!  Thank you!  And in my own native
language, muchas gracias!

josé

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Stephen Chrzanowski
I, as well, wish to thank you for this tool Dr. Hipp.  I've never published
a public application using this engine, but, at my place of employment,
where my primary responsibility is to just monitor servers world wide, I've
coded a few tidbit web and desktop applications that have made my job SO
much easier.  Without this, my desktop apps would have to rely on MySQL,
which is massive overkill (resource wise) for some of the things I've
needed to use it for.

Thanks again!

On Fri, Sep 19, 2014 at 9:22 PM, jose isaias cabrera <
[hidden email]> wrote:

> "Richard Hipp" wrote...
>
>
>
>  The latest SQLite 3.8.7 alpha version (available on the download page
>> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17
>> release
>> from 16 months ago.  That is to say, it does 50% more work using the same
>> number of CPU cycles.
>>
>> This performance gain is over and above the query planner improvements
>> that
>> have also been made.  We are constantly looking for new ways to run
>> queries
>> and adding those ways into the query planner.  For example, in the
>> previous
>> release, we added a new way to evaluate IN operators with non-constant
>> right-hand-sides that was reported on this mailing list to make some
>> queries run 5 times faster.
>>
>> The 50% faster number above is not about better query plans.  This is 50%
>> faster at the low-level grunt work of moving bits on and off disk and
>> search b-trees.  We have achieved this by incorporating hundreds of
>> micro-optimizations.  Each micro-optimization might improve the
>> performance
>> by as little as 0.05%.  If we get one that improves performance by 0.25%,
>> that is considered a huge win.  Each of these optimizations is
>> unmeasurable
>> on a real-world system (we have to use cachegrind to get repeatable
>> run-times) but if you do enough of them, they add up.
>>
>> A full 10% of the performance gain has come since the previous release.
>> There have been a lot of changes.  All our tests pass, and we still have
>> 100% branch test coverage, so we are confident that we didn't break too
>> much.  But your testing is an important part of our quality process.
>> Please download a source archive or a DLL and give the latest alpha a
>> whirl, and let us know if you encounter any problems.
>>
>> P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
>> Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
>> 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
>> cycles, as measured by cachegrind.
>>
>
> I don't know if folks have ever thank you, Dr. Hipp, for this wonderful
> gift to the world called SQLite.  I have become a legend in my own world
> with this tool. :-)  I do have to say that I have used it since 2006 and it
> has increased in speed every year.  Thank you!  Thank you!  And in my own
> native language, muchas gracias!
>
> josé
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Tony Papadimitriou
In reply to this post by Richard Hipp-3
In trying to see if the new version breaks any of my queries, I ran several
of them repeatedly, and they all appear to have produced the expected
output.

The only thing I noticed which maybe of interest in relation to speed
performance was (with .timer on) that although the first two run time
numbers (real & user) were consistently smaller in 3.8.7 (when compared to
3.8.6), the third number (sys) was consistently higher (or same in one
occasion).  I guess the first number is the actual time (in seconds) it took
to run the query.  I don't even know what the 2nd and 3rd numbers represent,
and how or if they maybe related to the first one.  Is that increase in sys
to be expected?

A few examples from many more I tried that all follow the same pattern (same
query & database in each case):

3.8.6: Run Time: real 2.434 user 2.386815 sys 0.000000
3.8.7: Run Time: real 1.856 user 1.778411 sys 0.062400
---
3.8.6: Run Time: real 584.465 user 560.293192 sys 1.638011
3.8.7: Run Time: real 518.227 user 430.469159 sys 53.617544
---
3.8.6: Run Time: real 2.449 user 2.340015 sys 0.046800
3.8.7: Run Time: real 1.935 user 1.794012 sys 0.046800

(Thank you for two great solutions I use daily -- SQLite3 and Fossil)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Richard Hipp-3
On Sat, Sep 20, 2014 at 1:34 PM, <[hidden email]> wrote:

> In trying to see if the new version breaks any of my queries, I ran
> several of them repeatedly, and they all appear to have produced the
> expected output.
>
> The only thing I noticed which maybe of interest in relation to speed
> performance was (with .timer on) that although the first two run time
> numbers (real & user) were consistently smaller in 3.8.7 (when compared to
> 3.8.6), the third number (sys) was consistently higher (or same in one
> occasion).  I guess the first number is the actual time (in seconds) it
> took to run the query.  I don't even know what the 2nd and 3rd numbers
> represent, and how or if they maybe related to the first one.  Is that
> increase in sys to be expected?
>

Thanks for the report.  I think the increased system time is harmless, but
I want to investigate further to be sure.  Except right now I'm preoccupied
with Mr. Xu's new bug.  So please remind me next week if I don't bring this
up again.  :-)

What OS are you using?

Can you share your database and test script with us?



>
> A few examples from many more I tried that all follow the same pattern
> (same query & database in each case):
>
> 3.8.6: Run Time: real 2.434 user 2.386815 sys 0.000000
> 3.8.7: Run Time: real 1.856 user 1.778411 sys 0.062400
> ---
> 3.8.6: Run Time: real 584.465 user 560.293192 sys 1.638011
> 3.8.7: Run Time: real 518.227 user 430.469159 sys 53.617544
> ---
> 3.8.6: Run Time: real 2.449 user 2.340015 sys 0.046800
> 3.8.7: Run Time: real 1.935 user 1.794012 sys 0.046800
>
> (Thank you for two great solutions I use daily -- SQLite3 and Fossil)
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Valentin Davydov-2
In reply to this post by Richard Hipp-3
On Fri, Sep 19, 2014 at 09:14:17PM -0400, Richard Hipp wrote:

> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.

Is there any similar benchmarks with regard to disk i/o operations rather
than CPU? Especially random read of cache misses, I mean.

Valentin Davydov.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

big stone
In reply to this post by Richard Hipp-3
Hi,

This 3.8.7alpha release seems to bring about 5%  win from 3.8.6 , on my
particular SQL test case.

Question : "PRAGMA threads=2" didn't bring any speed-up on my "2 true"
cores machine.

Did I miss a compilation option ?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Richard Hipp-3
On Mon, Sep 22, 2014 at 1:43 PM, big stone <[hidden email]> wrote:

> Hi,
>
> This 3.8.7alpha release seems to bring about 5%  win from 3.8.6 , on my
> particular SQL test case.
>
> Question : "PRAGMA threads=2" didn't bring any speed-up on my "2 true"
> cores machine.
>
> Did I miss a compilation option ?
>

The multi-thread sort should be on by default.  Probably you are just not
doing a big enough sort to make it worthwhile to start any new threads.
The "threads=2" pragma sets an upper limit on the number of threads.  There
is no guarantee that SQLite will use that many.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Richard Hipp-3
In reply to this post by Valentin Davydov-2
On Mon, Sep 22, 2014 at 8:29 AM, Valentin Davydov <[hidden email]>
wrote:

> On Fri, Sep 19, 2014 at 09:14:17PM -0400, Richard Hipp wrote:
>
> > The latest SQLite 3.8.7 alpha version (available on the download page
> > http://www.sqlite.org/download.html) is 50% faster than the 3.7.17
> release
> > from 16 months ago.  That is to say, it does 50% more work using the same
> > number of CPU cycles.
>
> Is there any similar benchmarks with regard to disk i/o operations rather
> than CPU? Especially random read of cache misses, I mean.
>

No, there are no such benchmarks.  On the other hand, we've been focused on
minimizing I/O in SQLite for the past decade.  I think we have extracted
about as much as we can fro m that vein.  But if you have any new ideas on
how we can further reduce the I/O, we'd love to hear from you.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

big stone
In reply to this post by Richard Hipp-3
ok,

Nearly all the time is spent in a big 'CTE' select.

So maybe this sort of ugly CTE query  is not threadable.

with
f0_k as
  (SELECT f.rowid as nof2, f.*, p.Dn, p.g1, p.g2, p.w, p.other FROM F0 AS
f, Pt AS p WHERE  f.Io =p.Io)
,F0_mcalc as
  (SELECT f.*, p.*, (case when Priority=999 then Cg else Source end) AS
Sourcef
   FROM F0_k AS f, Sor AS p WHERE  p.omg  in (f.Cg, '') And
   f.Slic Between Be And En
   And p.PtGroup In (f.Io,f.g1,f.g2,f.w,f.other,'PL','SO','ST','Not_found')
)
,F0_mcalcmin as
  (SELECT nof2, min(priority) AS minp FROM F0_mcalc GROUP BY nof2)
,F0_mcalc_final as
 (SELECT f.*, round(qty*coefficient,3) AS qtyf FROM  F0_mcalcmin
  AS fm inner join  F0_mcalc AS f on f.nof2=fm.nof2 and
f.priority=fm.minp)
 select *   from F0_mcalc_final ;
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Donald Shepherd
In reply to this post by Richard Hipp-3
Are any of these improvements specifically in the area of the online backup
API, or are they more in the general running of SQLite?

On 20 September 2014 11:14, Richard Hipp <[hidden email]> wrote:

> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.
>
> This performance gain is over and above the query planner improvements that
> have also been made.  We are constantly looking for new ways to run queries
> and adding those ways into the query planner.  For example, in the previous
> release, we added a new way to evaluate IN operators with non-constant
> right-hand-sides that was reported on this mailing list to make some
> queries run 5 times faster.
>
> The 50% faster number above is not about better query plans.  This is 50%
> faster at the low-level grunt work of moving bits on and off disk and
> search b-trees.  We have achieved this by incorporating hundreds of
> micro-optimizations.  Each micro-optimization might improve the performance
> by as little as 0.05%.  If we get one that improves performance by 0.25%,
> that is considered a huge win.  Each of these optimizations is unmeasurable
> on a real-world system (we have to use cachegrind to get repeatable
> run-times) but if you do enough of them, they add up.
>
> A full 10% of the performance gain has come since the previous release.
> There have been a lot of changes.  All our tests pass, and we still have
> 100% branch test coverage, so we are confident that we didn't break too
> much.  But your testing is an important part of our quality process.
> Please download a source archive or a DLL and give the latest alpha a
> whirl, and let us know if you encounter any problems.
>
> P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
> Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
> 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
> cycles, as measured by cachegrind.
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Richard Hipp-3
On Tue, Sep 23, 2014 at 2:33 AM, Donald Shepherd <[hidden email]>
wrote:

> Are any of these improvements specifically in the area of the online backup
> API, or are they more in the general running of SQLite?
>

The "speedtest1" benchmark does not use the backup API.  However, many of
the performance improvements are broadly applicable, so you would expect
that you would see some improvement in that API too.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

David Woodhouse
In reply to this post by Richard Hipp-3
On Fri, 2014-09-19 at 21:14 -0400, Richard Hipp wrote:

> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.
>
> This performance gain is over and above the query planner improvements that
> have also been made.  We are constantly looking for new ways to run queries
> and adding those ways into the query planner.  For example, in the previous
> release, we added a new way to evaluate IN operators with non-constant
> right-hand-sides that was reported on this mailing list to make some
> queries run 5 times faster.
>
> The 50% faster number above is not about better query plans.  This is 50%
> faster at the low-level grunt work of moving bits on and off disk and
> search b-trees.  We have achieved this by incorporating hundreds of
> micro-optimizations.  Each micro-optimization might improve the performance
> by as little as 0.05%.  If we get one that improves performance by 0.25%,
> that is considered a huge win.  Each of these optimizations is unmeasurable
> on a real-world system (we have to use cachegrind to get repeatable
> run-times) but if you do enough of them, they add up.
>
> A full 10% of the performance gain has come since the previous release.
> There have been a lot of changes.  All our tests pass, and we still have
> 100% branch test coverage, so we are confident that we didn't break too
> much.  But your testing is an important part of our quality process.
> Please download a source archive or a DLL and give the latest alpha a
> whirl, and let us know if you encounter any problems.
That looks really promising; thanks for all this work.

Tristan, you have a comprehensive set of benchmarks for Evolution's
addressbook; is it possible for someone else to run those or would it
take more of your time to babysit than it would to run them yourself?

--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Roger Binns
In reply to this post by Richard Hipp-3
On 22/09/14 10:48, Richard Hipp wrote:
> But if you have any new ideas on how we can further reduce the I/O, we'd love to hear from you.

The single biggest problem for me is defaults.  SQLite supports memory
mapped i/o which has many advantages.  The stat4 analyze does a really good
job.  WAL reduces writing contention.  These are all off by default for
various good reasons.  But it also means that by default the majority of
developers are not getting the best performance SQLite already has to offer,
unless they happen to stumble on these.  Some like stat4 are especially
problematic since it requires recompilation to address.

It would be nice if by default things were best.  One weak suggestion is to
have a few profiles, and have them selectable by pragma.  The profile would
then go ahead and set various options.  Future versions of SQLite would also
then update the profiles.

For example:

  pragma profile='max_performance' -- turns on all above, ups caches etc
  pragma profile='min_memory'    -- tunes down everything related to memory

Various members of the SQLite consortium can then do things like:

  pragma profile='firefox'

Roger
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

David Woodhouse
In reply to this post by Richard Hipp-3
On Fri, 2014-09-19 at 21:14 -0400, Richard Hipp wrote:
> The 50% faster number above is not about better query plans.

Speaking of better query plans, though... here's a query which takes
about 1700ms on my data set, followed by a couple of optimisations which
seem like they might be generically useful, and would help it go 300
times faster:

/* This is an address book. The 'main' table holds the primary
   information while the 'email_list' table holds email addresses
   because each contact can have more than one email address */
PRAGMA case_sensitive_like=on;
CREATE TABLE main (uid text primary key, first_name text, last_name text);
INSERT INTO main VALUES('aaaa','alice', 'X');
INSERT INTO main VALUES('bbbb','bert', 'Y');
INSERT INTO main VALUES('cccc','alison', 'Z');
CREATE INDEX main_first_name_index on main (first_name);
CREATE INDEX main_last_name_index on main (last_name);
CREATE TABLE email_list (uid text, email text);
INSERT INTO email_list VALUES('aaaa','[hidden email]');
INSERT INTO email_list VALUES('aaaa','[hidden email]');
INSERT INTO email_list VALUES('bbbb','[hidden email]');
CREATE INDEX email_uid_index on email_list (uid);
CREATE INDEX email_email_index on email_list (email);

SELECT DISTINCT main.uid
       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
       WHERE email_list.email LIKE 'al%'
       OR main.first_name like 'al%'
       OR main.last_name like 'al%';
/*
0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1
0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
*/

The first optimisation is to realise that sometimes, operations on the
result of a JOIN can be more efficiently resolved by using on the
original tables. Specifically, in this example, there's no point in
looking for a given first_name or last_name in the *joined* table, when
we have indices on those fields in the original 'main' table and can do
it much faster by querying those and then combining the results.

So the original query becomes:

SELECT DISTINCT main.uid
       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
       WHERE email_list.email LIKE 'al%'
UNION SELECT main.uid
       FROM main WHERE main.first_name LIKE 'al%'
       OR main.last_name LIKE 'al%';
/*
1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1
1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? AND first_name<?)
2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND last_name<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
*/

With that optimisation, the time taken on my data set is reduced to
about 1000ms. Which is quite a nice improvement from 1700ms but that's
just the first step...

The second optimisation is to realise that there's no point in that
being a LEFT JOIN any more, given that it's now only ever going to
return results where the right-hand-side actually does exist. Perhaps
normally we'd blame the user for such a naïve query, but when coupled
with the first optimisation, spotting this optimisation does actually
start to make sense. So now the query becomes:

SELECT DISTINCT main.uid
       FROM main JOIN email_list ON main.uid = email_list.uid
       WHERE email_list.email LIKE 'al%'
UNION SELECT main.uid
       FROM main WHERE main.first_name LIKE 'al%'
       OR main.last_name LIKE 'al%';
/*
1|0|1|SEARCH TABLE email_list USING INDEX email_email_index (email>? AND email<?)
1|1|0|SEARCH TABLE main USING COVERING INDEX sqlite_autoindex_main_1 (uid=?)
2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? AND first_name<?)
2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND last_name<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
*/

Now the query takes 6ms on my data set.

If this were a hard-coded query, of course it would make sense for me to
just do it this way in the client. But in my case — and I suspect it's
this way in a number of other real-world examples — we're actually just
representing a set of user-input search criteria in the SQL query. To
make this optimisation in the client would be distinctly non-trivial,
and it would be *really* nice if the sqlite query planner could do it.

Is that feasible?

--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Keith Medcalf

Would it not be more efficient to skip the join altogether since all you want is the list of uid's, and assuming that you have maintained the referential integrity of your database mail_list(list_uid) references main(uid)?

SELECT list_uid
  FROM mail_list
 WHERE email LIKE 'al%'
UNION
SELECT uid
  FROM main
 WHERE first_name LIKE 'al%'
    OR last_name LIKE 'al%';

>-----Original Message-----
>Speaking of better query plans, though... here's a query which takes
>about 1700ms on my data set, followed by a couple of optimisations which
>seem like they might be generically useful, and would help it go 300
>times faster:
>
>/* This is an address book. The 'main' table holds the primary
>   information while the 'email_list' table holds email addresses
>   because each contact can have more than one email address */
>PRAGMA case_sensitive_like=on;
>CREATE TABLE main (uid text primary key, first_name text, last_name
>text);
>INSERT INTO main VALUES('aaaa','alice', 'X');
>INSERT INTO main VALUES('bbbb','bert', 'Y');
>INSERT INTO main VALUES('cccc','alison', 'Z');
>CREATE INDEX main_first_name_index on main (first_name);
>CREATE INDEX main_last_name_index on main (last_name);
>CREATE TABLE email_list (uid text, email text);
>INSERT INTO email_list VALUES('aaaa','[hidden email]');
>INSERT INTO email_list VALUES('aaaa','[hidden email]');
>INSERT INTO email_list VALUES('bbbb','[hidden email]');
>CREATE INDEX email_uid_index on email_list (uid);
>CREATE INDEX email_email_index on email_list (email);
>
>SELECT DISTINCT main.uid
>       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>       WHERE email_list.email LIKE 'al%'
>       OR main.first_name like 'al%'
>       OR main.last_name like 'al%';
>/*
>0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1
>0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
>*/
>
>The first optimisation is to realise that sometimes, operations on the
>result of a JOIN can be more efficiently resolved by using on the
>original tables. Specifically, in this example, there's no point in
>looking for a given first_name or last_name in the *joined* table, when
>we have indices on those fields in the original 'main' table and can do
>it much faster by querying those and then combining the results.
>
>So the original query becomes:
>
>SELECT DISTINCT main.uid
>       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>       WHERE email_list.email LIKE 'al%'
>UNION SELECT main.uid
>       FROM main WHERE main.first_name LIKE 'al%'
>       OR main.last_name LIKE 'al%';
>/*
>1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1
>1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
>2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>?
>AND first_name<?)
>2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND
>last_name<?)
>0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
>*/
>
>With that optimisation, the time taken on my data set is reduced to
>about 1000ms. Which is quite a nice improvement from 1700ms but that's
>just the first step...
>
>The second optimisation is to realise that there's no point in that
>being a LEFT JOIN any more, given that it's now only ever going to
>return results where the right-hand-side actually does exist. Perhaps
>normally we'd blame the user for such a naïve query, but when coupled
>with the first optimisation, spotting this optimisation does actually
>start to make sense. So now the query becomes:
>
>SELECT DISTINCT main.uid
>       FROM main JOIN email_list ON main.uid = email_list.uid
>       WHERE email_list.email LIKE 'al%'
>UNION SELECT main.uid
>       FROM main WHERE main.first_name LIKE 'al%'
>       OR main.last_name LIKE 'al%';
>/*
>1|0|1|SEARCH TABLE email_list USING INDEX email_email_index (email>? AND
>email<?)
>1|1|0|SEARCH TABLE main USING COVERING INDEX sqlite_autoindex_main_1
>(uid=?)
>2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>?
>AND first_name<?)
>2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND
>last_name<?)
>0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
>*/
>
>Now the query takes 6ms on my data set.
>
>If this were a hard-coded query, of course it would make sense for me to
>just do it this way in the client. But in my case — and I suspect it's
>this way in a number of other real-world examples — we're actually just
>representing a set of user-input search criteria in the SQL query. To
>make this optimisation in the client would be distinctly non-trivial,
>and it would be *really* nice if the sqlite query planner could do it.
>
>Is that feasible?
>
>--
>dwmw2



_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

jose isaias cabrera
In reply to this post by Roger Binns

"Roger Binns" wrote...

> On 22/09/14 10:48, Richard Hipp wrote:
>> But if you have any new ideas on how we can further reduce the I/O, we'd
>> love to hear from you.
>
> The single biggest problem for me is defaults.  SQLite supports memory
> mapped i/o which has many advantages.  The stat4 analyze does a really
> good
> job.  WAL reduces writing contention.  These are all off by default for
> various good reasons.  But it also means that by default the majority of
> developers are not getting the best performance SQLite already has to
> offer,
> unless they happen to stumble on these.  Some like stat4 are especially
> problematic since it requires recompilation to address.
>
> It would be nice if by default things were best.  One weak suggestion is
> to
> have a few profiles, and have them selectable by pragma.  The profile
> would
> then go ahead and set various options.  Future versions of SQLite would
> also
> then update the profiles.
>
> For example:
>
>  pragma profile='max_performance' -- turns on all above, ups caches etc
>  pragma profile='min_memory'    -- tunes down everything related to memory
>
> Various members of the SQLite consortium can then do things like:
>
>  pragma profile='firefox'

This would be a nice set of options. On my case, I would set all connections
to our project to be" max_performance", as it is what we need.  Just
thinking out loud.

josé

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Simon Slavin-3

On 24 Sep 2014, at 2:13pm, jose isaias cabrera <[hidden email]> wrote:

> This would be a nice set of options. On my case, I would set all connections to our project to be" max_performance", as it is what we need.  Just thinking out loud.

How much max is max ?  Are you willing to give up ACID ?  Are you willing to have your database corrupted if there's power loss ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

David Woodhouse
In reply to this post by Keith Medcalf
On Wed, 2014-09-24 at 06:13 -0600, Keith Medcalf wrote:

>
> Would it not be more efficient to skip the join altogether since all
> you want is the list of uid's, and assuming that you have maintained
> the referential integrity of your database mail_list(list_uid)
> references main(uid)?
>
> SELECT list_uid
>   FROM mail_list
>  WHERE email LIKE 'al%'
> UNION
> SELECT uid
>   FROM main
>  WHERE first_name LIKE 'al%'
>     OR last_name LIKE 'al%';
Yes, but only because I oversimplified. In fact my queries are actually
after *other* fields from the main table. It's just that I'd elided
those fields from the example.

For the sake of the example, please assume my queries were actually
 'SELECT DISTINCT main.uid, main.first_name, main.last_name FROM …'

The real-life query that this is simplified *from* is discussed at
https://git.gnome.org/browse/evolution-data-server/commit/?id=5f9f5b52807

--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Roger Binns
In reply to this post by Simon Slavin-3
On 24/09/14 06:19, Simon Slavin wrote:
> How much max is max ?  

Not giving up ACID.  But for example stat4 is better than the default stat1.
 Memory mapping (especially on 64 bit) is great.  So is WAL.  All are off by
default.

If you want to give up ACID then you should really be on your own to look at
the various tradeoffs.

Roger

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
12