Lookup join

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Lookup join

Fredrik Larsen
Consider query below;

SELECT key
FROM t1
LEFT JOIN (
  SELECT key,max(rev),data
  FROM t2
  WHERE rev < ?
  GROUP BY key
) USING (key)
ORDER BY key ?
LIMIT ?

In above query sqlite will materialize the t2-sub-query and then start
working on the outer query. I have a lot of data in t2 so this will consume
a lot of time.

To overcome this I perform above query manually i two stages; I fetch the
t1 data, then for each row I do a lookup and manually join. This is very
fast by but makes it hard to reuse this base-query in other queries.

So my question is; Can sqlite do lookup-type joins, like to do manually in
code, to avoid the overhead of materializing the full t2-query on all keys,
and using just a fraction of this work?

I suspect the answer is no, if so, maybe this is solvable through a custom
virtual table? I have looked at ext/misc/eval.c, and this custom function
could be used if a function where alloed to return multiple columns..

Fredrik
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Keith Medcalf

You mean something like this:

 select key,
        maxrev,
        data
   from (
          select key,
                 null as maxrev,
                 null as data
            from t1
           where key not in (select key
                               from t2
                              where rev < :rev)
       union all
          select t1.key,
                 max(rev) as maxrev,
                 data
            from t1, t2
           where t1.key == t2.key
             and rev < :rev
        group by t2.key
         )
order by key;

t1 should have an index on key
t2 should have an index on key, rev

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Fredrik Larsen
>Sent: Monday, 30 September, 2019 04:12
>To: SQLite mailing list <[hidden email]>
>Subject: [sqlite] Lookup join
>
>Consider query below;
>
>SELECT key
>FROM t1
>LEFT JOIN (
>  SELECT key,max(rev),data
>  FROM t2
>  WHERE rev < ?
>  GROUP BY key
>) USING (key)
>ORDER BY key ?
>LIMIT ?
>
>In above query sqlite will materialize the t2-sub-query and then start
>working on the outer query. I have a lot of data in t2 so this will
>consume
>a lot of time.
>
>To overcome this I perform above query manually i two stages; I fetch the
>t1 data, then for each row I do a lookup and manually join. This is very
>fast by but makes it hard to reuse this base-query in other queries.
>
>So my question is; Can sqlite do lookup-type joins, like to do manually
>in
>code, to avoid the overhead of materializing the full t2-query on all
>keys,
>and using just a fraction of this work?
>
>I suspect the answer is no, if so, maybe this is solvable through a
>custom
>virtual table? I have looked at ext/misc/eval.c, and this custom function
>could be used if a function where alloed to return multiple columns..
>
>Fredrik
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Keith Medcalf
Or this, which is even simpler:

   select t1.key,
          max(t2.rev) as maxrev,
          t2.data
     from t1
left join t2
       on t1.key == t2.key
      and rev < :rev
    group by t1.key
    order by t1.key;

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Keith Medcalf
>Sent: Monday, 30 September, 2019 19:31
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] Lookup join
>
>
>You mean something like this:
>
> select key,
>        maxrev,
>        data
>   from (
>          select key,
>                 null as maxrev,
>                 null as data
>            from t1
>           where key not in (select key
>                               from t2
>                              where rev < :rev)
>       union all
>          select t1.key,
>                 max(rev) as maxrev,
>                 data
>            from t1, t2
>           where t1.key == t2.key
>             and rev < :rev
>        group by t2.key
>         )
>order by key;
>
>t1 should have an index on key
>t2 should have an index on key, rev
>
>--
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-----Original Message-----
>>From: sqlite-users <[hidden email]> On
>>Behalf Of Fredrik Larsen
>>Sent: Monday, 30 September, 2019 04:12
>>To: SQLite mailing list <[hidden email]>
>>Subject: [sqlite] Lookup join
>>
>>Consider query below;
>>
>>SELECT key
>>FROM t1
>>LEFT JOIN (
>>  SELECT key,max(rev),data
>>  FROM t2
>>  WHERE rev < ?
>>  GROUP BY key
>>) USING (key)
>>ORDER BY key ?
>>LIMIT ?
>>
>>In above query sqlite will materialize the t2-sub-query and then start
>>working on the outer query. I have a lot of data in t2 so this will
>>consume
>>a lot of time.
>>
>>To overcome this I perform above query manually i two stages; I fetch
>the
>>t1 data, then for each row I do a lookup and manually join. This is very
>>fast by but makes it hard to reuse this base-query in other queries.
>>
>>So my question is; Can sqlite do lookup-type joins, like to do manually
>>in
>>code, to avoid the overhead of materializing the full t2-query on all
>>keys,
>>and using just a fraction of this work?
>>
>>I suspect the answer is no, if so, maybe this is solvable through a
>>custom
>>virtual table? I have looked at ext/misc/eval.c, and this custom
>function
>>could be used if a function where alloed to return multiple columns..
>>
>>Fredrik
>>_______________________________________________
>>sqlite-users mailing list
>>[hidden email]
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Fredrik Larsen
In reply to this post by Fredrik Larsen
Thanks Keith! I have spent several days trying to tune my query towards
expected performance, without luck. I somehow missed your fairly straight
forward solution. I still have some problems making sqlite use the correct
indexes, but this can at least be fixed by well-placed INDEXED-BY-hints.

The declarative model of SQL is nice, but when you care about performance,
it quickly gets frustrating and time consuming..I guess experience will
help with this eventually.

Fredrik
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Richard Hipp-3
On 10/1/19, Fredrik Larsen <[hidden email]> wrote:
>
> The declarative model of SQL is nice, but when you care about performance,
> it quickly gets frustrating and time consuming.

In a perfect world, the query planner would recognize your intent and
do the right thing.  You would be able to write the query any way you
want, and the result would always be fast.

Alas, SQLite's query planner is not perfect.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Jose Isaias Cabrera-4

Richard Hipp, on Tuesday, October 1, 2019 02:05 PM, wrote...

>
> On 10/1/19, Fredrik Larsen, on
> >
> > The declarative model of SQL is nice, but when you care about performance,
> > it quickly gets frustrating and time consuming.
>
> In a perfect world, the query planner would recognize your intent and
> do the right thing.  You would be able to write the query any way you
> want, and the result would always be fast.
>
> Alas, SQLite's query planner is not perfect.

WHAT!!??  I quit! :-)  Just kidding.

josé
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Jay Kreibich
In reply to this post by Richard Hipp-3

> On Oct 1, 2019, at 1:05 PM, Richard Hipp <[hidden email]> wrote:


> Alas, SQLite's query planner is not perfect.

...files bug report...  “lacking perfection.”

;-)

  -j
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Fredrik Larsen
In reply to this post by Richard Hipp-3
It may not be perfect, but it is impressive. Impressive and frustrating are
not mutually exclusive :)

Anyway, I'm probably not a typical sqlite user. I use sqlite as a simple
indexing system, where all reads are expected to hit an index and return
instantly. For this project, direct control over the indexing system would
be better. But when the SQL-translation works as expected, it is nice to
have sql available as a tool for general reporting.

Fredrik

On Tue, Oct 1, 2019 at 8:05 PM Richard Hipp <[hidden email]> wrote:

> On 10/1/19, Fredrik Larsen <[hidden email]> wrote:
> >
> > The declarative model of SQL is nice, but when you care about
> performance,
> > it quickly gets frustrating and time consuming.
>
> In a perfect world, the query planner would recognize your intent and
> do the right thing.  You would be able to write the query any way you
> want, and the result would always be fast.
>
> Alas, SQLite's query planner is not perfect.
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Keith Medcalf
In reply to this post by Fredrik Larsen

On Tuesday, 1 October, 2019 11:58, Fredrik Larsen <[hidden email]> wrote:

>Thanks Keith! I have spent several days trying to tune my query towards
>expected performance, without luck. I somehow missed your fairly straight
>forward solution. I still have some problems making sqlite use the
>correct indexes, but this can at least be fixed by well-placed INDEXED-BY-hints.

   select t1.key,
          max(t2.rev) as maxrev,
          t2.data
     from t1
left join t2
       on t1.key == t2.key
      and t2.rev < ?
 group by t1.key
 order by t1.key (asc|desc)
    limit ?
;

I should think that this version would give the query planner the widest latitude to use the available indexes (again, on t1 (key) and t2 (key, rev)).  With the upcoming 3.30.0 version of SQLite3 you should get the same performance when using "order by t1.key desc" as when using "order by t1.key asc" as it will push the order down into the (group by) rather than using an extra sorter ... and it should only require to traverse t1 in index order and do one index lookup per t1 row into t2 -- so theoretically it should be the absolute minimum work required to answer the query.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Simon Slavin-3
When trying to make SQLite pick the right index, please use this sequence.

1) Create your indexes.
2) Put typical example data in your table.
3) Use the ANALYZE command
4) Delete the test data and put in real data (optional)

You can do (1) and (2) in any order.  But having data in the table makes a difference, and leads to better results than doing ANALYZE with empty tables.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Fredrik Larsen
In reply to this post by Keith Medcalf
Yes, your note about DESC ordering is the only thing missing now, and then
everything will be perfect :) I have a build of sqlite with the
GROUP-BY-DESC patch applied, but not able to test now. But previous testing
using very similar queries did work fine so don't expect any problems.

Now all my queries are below 10ms on fairly big data-sets, from 1-2
seconds. That is an improvement of several orders of magnitude :)

Fredrik

On Tue, Oct 1, 2019 at 8:49 PM Keith Medcalf <[hidden email]> wrote:

>
> On Tuesday, 1 October, 2019 11:58, Fredrik Larsen <[hidden email]>
> wrote:
>
> >Thanks Keith! I have spent several days trying to tune my query towards
> >expected performance, without luck. I somehow missed your fairly straight
> >forward solution. I still have some problems making sqlite use the
> >correct indexes, but this can at least be fixed by well-placed
> INDEXED-BY-hints.
>
>    select t1.key,
>           max(t2.rev) as maxrev,
>           t2.data
>      from t1
> left join t2
>        on t1.key == t2.key
>       and t2.rev < ?
>  group by t1.key
>  order by t1.key (asc|desc)
>     limit ?
> ;
>
> I should think that this version would give the query planner the widest
> latitude to use the available indexes (again, on t1 (key) and t2 (key,
> rev)).  With the upcoming 3.30.0 version of SQLite3 you should get the same
> performance when using "order by t1.key desc" as when using "order by
> t1.key asc" as it will push the order down into the (group by) rather than
> using an extra sorter ... and it should only require to traverse t1 in
> index order and do one index lookup per t1 row into t2 -- so theoretically
> it should be the absolute minimum work required to answer the query.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Fredrik Larsen
In reply to this post by Simon Slavin-3
I have run analyze on production data, that should work better, right? Or
do you propose to "trick" the query-planner using some kind of staging-data?

Anyway, I use INDEXED BY hints now, and this solves my problem.

Fredrik

On Tue, Oct 1, 2019 at 8:57 PM Simon Slavin <[hidden email]> wrote:

> When trying to make SQLite pick the right index, please use this sequence.
>
> 1) Create your indexes.
> 2) Put typical example data in your table.
> 3) Use the ANALYZE command
> 4) Delete the test data and put in real data (optional)
>
> You can do (1) and (2) in any order.  But having data in the table makes a
> difference, and leads to better results than doing ANALYZE with empty
> tables.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Lookup join

Simon Slavin-3
On 1 Oct 2019, at 8:15pm, Fredrik Larsen <[hidden email]> wrote:

> I have run analyze on production data, that should work better, right?

Yes, that is the best way to do it.  I've been given some databases where the data saved by ANALYZE shows it was run with empty tables.  So I sometimes warn people that this does not give the right results.

Glad to see that you got the behaviour you want using INDEXED BY.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users