50% faster than 3.7.17

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Keith Medcalf

Interesting.  From that code you might want to try something like this:

SELECT uid, vcard, bdata
  FROM folder_id
 WHERE uid in ( select uid FROM email_list where value like 'p%'
               union
                select uid from folder_id where nickname LIKE 'p%'
               union
                select uid from folder_id where full_name LIKE 'p%'
               union
                select uid from folder_id where family_name LIKE 'p%'
               union
                select uid from folder_id where given_name LIKE 'p%'
               union
                select uid from folder_id where nickname LIKE 'p%'
               union
                select uid from folder_id where file_as LIKE 'p%'
              );

Then having nocase indexes on the various search fields will all work as expected.

>-----Original Message-----
>From: David Woodhouse [mailto:[hidden email]]
>Sent: Wednesday, 24 September, 2014 07:25
>To: Keith Medcalf
>Cc: General Discussion of SQLite Database
>Subject: Re: [sqlite] 50% faster than 3.7.17
>
>On Wed, 2014-09-24 at 06:13 -0600, Keith Medcalf wrote:
>>
>> Would it not be more efficient to skip the join altogether since all
>> you want is the list of uid's, and assuming that you have maintained
>> the referential integrity of your database mail_list(list_uid)
>> references main(uid)?
>>
>> SELECT list_uid
>>   FROM mail_list
>>  WHERE email LIKE 'al%'
>> UNION
>> SELECT uid
>>   FROM main
>>  WHERE first_name LIKE 'al%'
>>     OR last_name LIKE 'al%';
>
>Yes, but only because I oversimplified. In fact my queries are actually
>after *other* fields from the main table. It's just that I'd elided
>those fields from the example.
>
>For the sake of the example, please assume my queries were actually
> 'SELECT DISTINCT main.uid, main.first_name, main.last_name FROM …'
>
>The real-life query that this is simplified *from* is discussed at
>https://git.gnome.org/browse/evolution-data-server/commit/?id=5f9f5b52807
>
>--
>dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Stephen Chrzanowski
In reply to this post by Roger Binns
In the vein of configurability, and in a day dream I just had, it would be
nice (But probably not possible as there could be compiler directives you
can't use at the same time) that we could have a single output
DLL/SO/whatever dumped from the compiler that had everything available,
then, via defaults in the source code, IF/THEN/ELSE (or whatever the C
equiv is) operations are considered to have certain functionality run
during the query process.  Via a configuration file, or via pragma which
would be generated based on a configuration, feature sets could be set at
run time.

One fault with this would be the issue of COMPILED size.  I realize we're
in an age where we generally talk megabytes instead of kilobytes in regards
to main thread compiled code, but I'm sure there are platforms out there
that are using SQLite that have to fit in the kilobyte range, and need
special compiled sources.

The advantage to this is that I wouldn't have to compile X number of DLLs
for Y number of programs to get the results I need, just have one
configuration file kicking around that I can adjust as needed.  It'd also
help with fine tuning to validate whether I need certain features turned on
or off.

On Wed, Sep 24, 2014 at 1:15 PM, Roger Binns <[hidden email]> wrote:

> On 24/09/14 06:19, Simon Slavin wrote:
> > How much max is max ?
>
> Not giving up ACID.  But for example stat4 is better than the default
> stat1.
>  Memory mapping (especially on 64 bit) is great.  So is WAL.  All are off
> by
> default.
>
> If you want to give up ACID then you should really be on your own to look
> at
> the various tradeoffs.
>
> Roger
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

David Woodhouse
In reply to this post by Keith Medcalf
On Wed, 2014-09-24 at 19:36 -0600, Keith Medcalf wrote:

>
> Interesting.  From that code you might want to try something like this:
>
> SELECT uid, vcard, bdata
>   FROM folder_id
>  WHERE uid in ( select uid FROM email_list where value like 'p%'
>                union
>                 select uid from folder_id where nickname LIKE 'p%'
>                union
>                 select uid from folder_id where full_name LIKE 'p%'
>                union
>                 select uid from folder_id where family_name LIKE 'p%'
>                union
>                 select uid from folder_id where given_name LIKE 'p%'
>                union
>                 select uid from folder_id where nickname LIKE 'p%'
>                union
>                 select uid from folder_id where file_as LIKE 'p%'
>               );
>
> Then having nocase indexes on the various search fields will all work as expected.
Yeah, that achieves the same speed.

I'm not sure it addresses the real problem though. It still only really
applies when the user's query (which we're translating to SQL) contains
only 'OR' and no 'AND' clauses. It doesn't help me translate a query in
the general case.

Now, I'm not *entirely* averse to having a special case for the 'all OR'
query — this particular query is the address autocompletion so it's
common and the user is actually waiting for it as they type. In fact, as
a proof of concept I've already *implemented* a hackish special case to
spot this case and basically submit a hand-crafted query instead of the
normal translation to SQL:
    https://bugzilla.gnome.org/show_bug.cgi?id=699597#c19

The problem is that I don't *want* to have to have that special case.
This is just a query optimisation, which is something the query planner
is supposed to do. I don't *want* to implement this and other
optimisations in the client, just to trick *today's* sqlite query
planner into spotting the best way to do it. That's the Wrong Way™ to do
things.

There are two alternative approaches which *don't* seem as wrong.

Firstly, if there's a sane way to rewrite our translator so that it
naturally uses UNION for OR clauses, that might make sense. But to cope
with AND clauses, AFAICT the natural extension of that approach would be
to use 'SELECT FROM ... SELECT FROM' and then we lose the use of indices
for *those* cases, right¹? Tristan started a thread about this 'nested
select' last year, which I picked up a couple of weeks ago. It didn't
seem like it was a viable strategy for the general case.

The second approach, and this is why I started this thread, is to 'fix'
the query planner so that that it can see for *itself* the best way to
implement a given query given the constraints.

I suggested a couple of specific optimisations which the query planner
might be able to make, which should hopefully have benefits wider than
just my own use case. Are those not viable?

--
dwmw2

¹ I do realise that in the special case of a single top-level AND such
  that all its sub-clauses are necessary conditions, I can do something
  nicer. But again, it's a special case and doesn't handle the general
  case of nested AND and OR clauses. And it's still the *client* code
  doing the job of the optimiser, spotting necessary vs. sufficient
  conditions and pulling them out to the top level for more efficient
  implementation.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

David Woodhouse
In reply to this post by David Woodhouse
On Tue, 2014-09-23 at 17:48 +0100, David Woodhouse wrote:
> That looks really promising; thanks for all this work.
>
> Tristan, you have a comprehensive set of benchmarks for Evolution's
> addressbook; is it possible for someone else to run those or would it
> take more of your time to babysit than it would to run them yourself?

I was able to get these benchmarks running, and compared the 3.8.7
snapshot against the version of 3.8.6 which is shipped in Fedora 20.

Results temporarily at http://westmere.infradead.org/charts/sqlite387/

This isn't a perfect comparison since the Fedora package is built from
sqlite-src-3080600.zip and the equivalent source for the 3.8.7 alpha
didn't seem to be available. But it was built as a shared library using
the same configuration and compiler flags, and the benchmark compares
identical code running against both the 3.8.6 and 3.8.7 libraries.

I haven't done any real analysis other than looking at the pretty
pictures, but certainly nothing seems to go slower (or break), and there
are some noticeable improvements on some of the benchmarks
(filter-by-long-full-name-suffix.png,
filter-by-short-full-name-suffix.png, filter-containing-full-name.png,
filter-containing-given-name.png, filter-containing-phone-number.png).

--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

jose isaias cabrera
In reply to this post by Simon Slavin-3
"Simon Slavin" wrote...

>
> On 24 Sep 2014, at 2:13pm, jose isaias cabrera <[hidden email]>
> wrote:
>
>> This would be a nice set of options. On my case, I would set all
>> connections to our project to be" max_performance", as it is what we
>> need.  Just thinking out loud.
>
> How much max is max ?  Are you willing to give up ACID ?  Are you willing
> to have your database corrupted if there's power loss ?

Ok, back to reality... ;-), no, I will not.  Good questions, Simon.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

David Woodhouse
In reply to this post by David Woodhouse
On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote:
>
> I suggested a couple of specific optimisations which the query planner
> might be able to make, which should hopefully have benefits wider than
> just my own use case. Are those not viable?

I'm preparing to commit a workaround to Evolution to avoid this issue,
and then move on with my life and forget about it.

Before I do, is it worth me rephrasing this as a 'suboptimal query plan'
bug report so it gets tracked and might get attention later? Or should I
just forget the idea of getting it fixed in sqlite?

--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

Dan Kennedy-4
On 10/09/2014 04:38 PM, David Woodhouse wrote:

> On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote:
>> I suggested a couple of specific optimisations which the query planner
>> might be able to make, which should hopefully have benefits wider than
>> just my own use case. Are those not viable?
> I'm preparing to commit a workaround to Evolution to avoid this issue,
> and then move on with my life and forget about it.
>
> Before I do, is it worth me rephrasing this as a 'suboptimal query plan'
> bug report so it gets tracked and might get attention later? Or should I
> just forget the idea of getting it fixed in sqlite?


Well, you could always create a patch...

I think I understand the second optimization. You're saying that given this:

   SELECT DISTINCT main.uid
          FROM main LEFT JOIN email_list ON main.uid = email_list.uid
          WHERE email_list.email LIKE 'al%'

SQLite should deduce that since the LIKE implies "email_list.email IS NOT NULL" the LEFT JOIN is equivalent to a regular JOIN. Which would allow SQLite to reorder the tables and perhaps come up with a more efficient query plan. Correct?

Seems like a reasonable idea.

The first optimization might be trickier. With queries that feature a single table:

   SELECT cols FROM tbl WHERE a=? OR b=? OR c=?

the planner may elect to run something very close to this:

   SELECT cols FROM tbl WHERE a=?
     UNION ALL
   SELECT cols FROM tbl WHERE b=?
     UNION ALL
   SELECT cols FROM tbl WHERE c=?

However, after returning each row, we remember its PRIMARY KEY (either the rowid or real PK for WITHOUT ROWID tables). Similar transformations for individual loops within join queries are also possible.

However, with a JOIN query, we don't currently attempt this kind of transform. I think because we would have to create some kind of composite key to use in place of the PRIMARY KEY to avoid returning duplicates. I guess queries that have a DISTINCT clause don't have this problem. So it could in theory transform your query to:

   SELECT DISTINCT
        main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
        WHERE email_list.email LIKE 'al%'
     UNION ALL
   SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
        WHERE main.first_name like 'al%'
     UNION ALL
   SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
        WHERE main.last_name like 'al%';

The the hypothetical optimization above could figure out that the first LEFT JOIN could just as easily be a JOIN.

And that the other two LEFT JOINs are not required at all due to the DISTINCT and the fact that the WHERE clauses on the sub-selects do not reference the joined table at all.

It seems like there are a few moving parts here. And none of these are trivial changes. So, good ideas that might show up in an SQLite release at some point, but it's not realistic to expect these optimizations to be implemented quickly. Unless, of course, you can propose a patch and they turn out to be simpler than they look.

Regards,
Dan.






_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: 50% faster than 3.7.17

David Woodhouse
On Thu, 2014-10-09 at 17:41 +0700, Dan Kennedy wrote:
> I think I understand the second optimization. You're saying that given this:
>
>    SELECT DISTINCT main.uid
>           FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>           WHERE email_list.email LIKE 'al%'
>
> SQLite should deduce that since the LIKE implies "email_list.email IS
> NOT NULL"

(
    Digression: SQLite *should* deduce the LIKE implies the IS NOT NULL
    condition. My *actual* queries all have, for some bizarre
    historical reason I haven't yet explored,

        email_list.value IS NOT NULL AND email_list.value LIKE 'foo%'

    You'd think that the former condition would end up optimised away,
    but timings show that it clearly isn't. I wasn't going to bring that
    up here since I think it is so clearly a stupid thing to be putting
    in our query in the first place. But you just made me do it anyway.
)

> the LEFT JOIN is equivalent to a regular JOIN. Which would allow
> SQLite to reorder the tables and perhaps come up with a more efficient
> query plan. Correct?

Indeed.

As I said, a glib response to this suggested optimisation in *isolation*
might be that the user shouldn't have asked for a LEFT JOIN at all if
they didn't need it. But of course when coupled with the other suggested
transforms, that glib response is a lot less valid.

>
> The first optimization might be trickier.
 ...
>  I guess queries that have a DISTINCT clause don't have this problem.

Right. I think it only works with DISTINCT, but that's OK.

>  So it could in theory transform your query to:
>
>    SELECT DISTINCT
>         main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>         WHERE email_list.email LIKE 'al%'
>      UNION ALL
>    SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>         WHERE main.first_name like 'al%'
>      UNION ALL
>    SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>         WHERE main.last_name like 'al%';
>
> The the hypothetical optimization above could figure out that the
> first LEFT JOIN could just as easily be a JOIN.
Right.

> And that the other two LEFT JOINs are not required at all due to the
> DISTINCT and the fact that the WHERE clauses on the sub-selects do not
> reference the joined table at all.

Yeah, that's probably true. I'd assumed that we'd do it as part of the
"first optimisation" introducing the UNION but it's probably better
handled as a separate transform as you have it here.

> It seems like there are a few moving parts here. And none of these are
> trivial changes. So, good ideas that might show up in an SQLite
> release at some point, but it's not realistic to expect these
> optimizations to be implemented quickly.

TBH that's all I was really looking for; thanks. I *hate* it when people
silently work around limitations in my software without bringing them to
my attention, and I didn't want to be guilty of doing the same.

I certainly don't expect an instant fix; just knowing that it's on the
radar is perfectly sufficient. Now I can commit my hackish workaround to
rewrite certain special-case queries to optimise the output of the
current query planner for them, and still sleep at night :)

Thanks.

>  Unless, of course, you can propose a patch and they turn out to be
> simpler than they look.

That would be interesting but realistically it's outside my capacity at
the moment. I am already *so* far into shaving this yak that I can
barely remember what I was actually intending to do in the first
place :)

--
David Woodhouse                            Open Source Technology Centre
[hidden email]                              Intel Corporation

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
12