Performance issue with left join in 3.24 compared with 3.22

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance issue with left join in 3.24 compared with 3.22

Eric Grange-3
Hi,

I am experiencing a massive performance issue on a query with a left join
in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
milliseconds.
The problematic query looks like

      select d.key_field, count(*) nb
      from low_volume_table b
      join mid_volume_table c on c.key_b = b.id
      left join high_volume_table d on d.key_c = c.id
      where b.id >= $1
      and b.filter1 = 0
      and c.filter2 > 0
      and d.filter3 > 0
      and d.filter4 = 1
      group by d.key_field
      order by nb desc

The filter fields on it are relatively non-discriminant (and
non_indexed), however the key_field is indexed.

The most discriminating conditions in this query are those on the
low_volume and mid_volume tables, but the optimizer
selects as first action:

    SCAN TABLE high_volume_table USING INDEX key_field_idx

which leads to a huge number of iterations.

If on the other hand, just one of the d filter conditions is removed, then
the optimizer goes (like 3.22) first for

   SEARCH TABLE low_volume_table AS b USING COVERING INDEX
low_volume_table_id_idx
(b.filter1=? AND rowid>?)

This happens after running ANALYZE, the sqlite1_stat for the high_volume
table and key_field_idx is

     5855234 6

while for the  low_volume_table_filter1_idx it is

     1976628 988314

While the  low_volume_table_filter1_idx does not look very selective, as it
is coupled with rowid filtering, it is actually very effective
as it combines rowid & filter1, so there are just thousandths of rows being
considered in the "group by", while when starting from
a key_field_idx, there are millions of rows being considered, the
overwhelming majority not fulfilling the conditions.

The above table names and fields have been anonymized, if someone from the
SQLite team want to have a look at the actual data,
I can provide a database download (it's about 1.7 GB)

Thanks!
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance issue with left join in 3.24 compared with 3.22

Eric Grange-3
Also ran a few index to "force" the query plan, but with limited success:

- the "indexed by" clause does not result in the optimizer using the index
first, it just uses the indexes in the later steps of the query plan.
- using "not indexed" still results in the same table scan of
high_volume_table first, just without the index.
- using the unary "+" on the d table filters has no effect on the query
plan (as these are not indexed in the first place I guess)

Using unlikely() on the d table filters seems to be the only option that
works.


On Tue, Jun 26, 2018 at 10:02 AM, Eric Grange <[hidden email]> wrote:

> Hi,
>
> I am experiencing a massive performance issue on a query with a left join
> in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> milliseconds.
> The problematic query looks like
>
>       select d.key_field, count(*) nb
>       from low_volume_table b
>       join mid_volume_table c on c.key_b = b.id
>       left join high_volume_table d on d.key_c = c.id
>       where b.id >= $1
>       and b.filter1 = 0
>       and c.filter2 > 0
>       and d.filter3 > 0
>       and d.filter4 = 1
>       group by d.key_field
>       order by nb desc
>
> The filter fields on it are relatively non-discriminant (and
> non_indexed), however the key_field is indexed.
>
> The most discriminating conditions in this query are those on the
> low_volume and mid_volume tables, but the optimizer
> selects as first action:
>
>     SCAN TABLE high_volume_table USING INDEX key_field_idx
>
> which leads to a huge number of iterations.
>
> If on the other hand, just one of the d filter conditions is removed, then
> the optimizer goes (like 3.22) first for
>
>    SEARCH TABLE low_volume_table AS b USING COVERING INDEX
> low_volume_table_id_idx (b.filter1=? AND rowid>?)
>
> This happens after running ANALYZE, the sqlite1_stat for the high_volume
> table and key_field_idx is
>
>      5855234 6
>
> while for the  low_volume_table_filter1_idx it is
>
>      1976628 988314
>
> While the  low_volume_table_filter1_idx does not look very selective, as
> it is coupled with rowid filtering, it is actually very effective
> as it combines rowid & filter1, so there are just thousandths of rows
> being considered in the "group by", while when starting from
> a key_field_idx, there are millions of rows being considered, the
> overwhelming majority not fulfilling the conditions.
>
> The above table names and fields have been anonymized, if someone from the
> SQLite team want to have a look at the actual data,
> I can provide a database download (it's about 1.7 GB)
>
> Thanks!
>
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Performance issue with left join in 3.24 compared with 3.22

Hick Gunter
Add "cross" before the first "join" to force the first table into the outermost loop

-----Urspr√ľngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Eric Grange
Gesendet: Dienstag, 26. Juni 2018 10:13
An: General Discussion of SQLite Database <[hidden email]>
Betreff: [EXTERNAL] Re: [sqlite] Performance issue with left join in 3.24 compared with 3.22

Also ran a few index to "force" the query plan, but with limited success:

- the "indexed by" clause does not result in the optimizer using the index first, it just uses the indexes in the later steps of the query plan.
- using "not indexed" still results in the same table scan of high_volume_table first, just without the index.
- using the unary "+" on the d table filters has no effect on the query plan (as these are not indexed in the first place I guess)

Using unlikely() on the d table filters seems to be the only option that works.


On Tue, Jun 26, 2018 at 10:02 AM, Eric Grange <[hidden email]> wrote:

> Hi,
>
> I am experiencing a massive performance issue on a query with a left
> join in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> milliseconds.
> The problematic query looks like
>
>       select d.key_field, count(*) nb
>       from low_volume_table b
>       join mid_volume_table c on c.key_b = b.id
>       left join high_volume_table d on d.key_c = c.id
>       where b.id >= $1
>       and b.filter1 = 0
>       and c.filter2 > 0
>       and d.filter3 > 0
>       and d.filter4 = 1
>       group by d.key_field
>       order by nb desc
>
> The filter fields on it are relatively non-discriminant (and
> non_indexed), however the key_field is indexed.
>
> The most discriminating conditions in this query are those on the
> low_volume and mid_volume tables, but the optimizer selects as first
> action:
>
>     SCAN TABLE high_volume_table USING INDEX key_field_idx
>
> which leads to a huge number of iterations.
>
> If on the other hand, just one of the d filter conditions is removed,
> then the optimizer goes (like 3.22) first for
>
>    SEARCH TABLE low_volume_table AS b USING COVERING INDEX
> low_volume_table_id_idx (b.filter1=? AND rowid>?)
>
> This happens after running ANALYZE, the sqlite1_stat for the
> high_volume table and key_field_idx is
>
>      5855234 6
>
> while for the  low_volume_table_filter1_idx it is
>
>      1976628 988314
>
> While the  low_volume_table_filter1_idx does not look very selective,
> as it is coupled with rowid filtering, it is actually very effective
> as it combines rowid & filter1, so there are just thousandths of rows
> being considered in the "group by", while when starting from a
> key_field_idx, there are millions of rows being considered, the
> overwhelming majority not fulfilling the conditions.
>
> The above table names and fields have been anonymized, if someone from
> the SQLite team want to have a look at the actual data, I can provide
> a database download (it's about 1.7 GB)
>
> Thanks!
>
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance issue with left join in 3.24 compared with 3.22

Richard Hipp-3
In reply to this post by Eric Grange-3
On 6/26/18, Eric Grange <[hidden email]> wrote:

> I am experiencing a massive performance issue on a query with a left join
> in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> milliseconds.
> The problematic query looks like
>
>       select d.key_field, count(*) nb
>       from low_volume_table b
>       join mid_volume_table c on c.key_b = b.id
>       left join high_volume_table d on d.key_c = c.id
>       where b.id >= $1
>       and b.filter1 = 0
>       and c.filter2 > 0
>       and d.filter3 > 0
>       and d.filter4 = 1
>       group by d.key_field
>       order by nb desc

Dan bisected and found the speed reduction coincides with the
introduction of the LEFT JOIN strength reduction optimization
(https://sqlite.org/optoverview.html#leftjoinreduction) in version
3.23.0.  (Dan bisected to the specific check-in
https://sqlite.org/src/info/dd568c27).

Your work-around is to change the LEFT JOIN into CROSS JOIN, thus
forcing the query planner to preserve the same join order as it did
before the string reduction optimization.

We (the SQLite devs) will take an action to try to improve the query
planner so that it picks a better plan for your case.


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance issue with left join in 3.24 compared with 3.22

Eric Grange
Thanks Richard!

Changing the inner join to a cross join works as well in that case, though
is it enough to always disable the left join optimization ?

I have other variants of the query with different/more left joined
tables/subqueries, and varying filtering conditions, as the query
is dynamically generated from user options and filters (which can indeed
lead to SQL that is not really "optimal").

Is having a cross join somewhere among the joins enough to "disable" the
left join strength reduction for other joins?

On Tue, Jun 26, 2018 at 5:58 PM, Richard Hipp <[hidden email]> wrote:

> On 6/26/18, Eric Grange <[hidden email]> wrote:
> > I am experiencing a massive performance issue on a query with a left join
> > in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> > milliseconds.
> > The problematic query looks like
> >
> >       select d.key_field, count(*) nb
> >       from low_volume_table b
> >       join mid_volume_table c on c.key_b = b.id
> >       left join high_volume_table d on d.key_c = c.id
> >       where b.id >= $1
> >       and b.filter1 = 0
> >       and c.filter2 > 0
> >       and d.filter3 > 0
> >       and d.filter4 = 1
> >       group by d.key_field
> >       order by nb desc
>
> Dan bisected and found the speed reduction coincides with the
> introduction of the LEFT JOIN strength reduction optimization
> (https://sqlite.org/optoverview.html#leftjoinreduction) in version
> 3.23.0.  (Dan bisected to the specific check-in
> https://sqlite.org/src/info/dd568c27).
>
> Your work-around is to change the LEFT JOIN into CROSS JOIN, thus
> forcing the query planner to preserve the same join order as it did
> before the string reduction optimization.
>
> We (the SQLite devs) will take an action to try to improve the query
> planner so that it picks a better plan for your case.
>
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users