Join, Union, Subquery or what?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

R Smith


On 2017/09/09 9:20 PM, Nico Williams wrote:

> On Sat, Sep 09, 2017 at 05:56:03PM +0200, R Smith wrote:
>> *Findings* pertaining to JOIN, INTERSECT and WHERE IN (...) type queries on
>> tables *WITHOUT Row_ids*:
>> (This is the full test posted below because it is the one that matters most)
>> INTERSECT AND WHERE IN (...) queries posted similar times here with WHERE IN
>> (...) being the slightly faster (similar to the above findings), but BOTH
>> were a good 10 - 15% faster than on the ROW-ID tables, so WITHOUT ROWID
>> tables seem to have a definite advantage here (it is currently unclear to me
>> why this is).
>>
>> A troubling test is the JOIN on WITHOUT ROWID tables - it took several
>> orders of magnitude longer than any other test in the entire experiment.
> In your first test you were ordering by PK, now you're not, and you
> don't have an [covering] index on the columns you're ordering by, so,
> yeah, "orders of magnitude" slower is to be expected.  You're comparing
> apples and oranges. /// etc ....

I think you are missing something or my explanation was not clear.
When I say "first test" I mean of THIS test suite, not the previous set
from 3 days ago.

I am in no way ordering by PK in this set of tests, anywhere. I am not
using 1 column, I am using 151 columns for both tests. The only thing
that changes between the two scripts are the words "WITHOUT ROWID" being
added or removed, NOTHING else. The problem also happens on the
un-ordered tests. The ordering is of no consequence.

To be clear, there are two tests, both do not order by PK, both do not
have covering indexes, both have 151 columns and lots of data... the
only difference between them is the words "WITHOUT ROWID" at the end of
the test tables. One query runs in ~50 milliseconds, the other runs for
minutes. The results are exactly the same.

You can test this by taking the script I provided, run it, it will run
for longer than 2 minutes (depending on your machine speed) because of
the JOIN queries taking over a minute each, then remove the words
"WITHOUT ROWID" from the two test tables and run it again. The entire
script will finish in under  3 seconds with the JOIN queries weighing in
at circa 50ms each.

That cannot possibly be an expected circumstance and it isn't an
apples-oranges thing.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

Nico Williams
On Sat, Sep 09, 2017 at 11:26:35PM +0200, R Smith wrote:
> I think you are missing something or my explanation was not clear.
> When I say "first test" I mean of THIS test suite, not the previous set from
> 3 days ago.

I meant the opposite.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

Clemens Ladisch
In reply to this post by R Smith
R Smith wrote:
> I am using 151 columns for both tests. The only thing that changes
> between the two scripts are the words "WITHOUT ROWID" being added

<http://www.sqlite.org/withoutrowid.html> says:
| WITHOUT ROWID tables will work correctly ... for tables with a single
| INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
| that case.
and:
| WITHOUT ROWID tables work best when individual rows are not too large.

So this is pretty much the documented worst case for WITHOUT ROWID tables.

If the query is executed by making a copy of all the table data into
a temporary B-tree (index), which is what INTERSECT and probably IN(...)
are doing, then the WITHOUT-ROWID-ness of the table does not matter.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

R Smith
Well yes but the documentation suggests that one could expect a slight
degradation. The words "works best with" does not seem to imbue an idea
of "give WITHOUT ROWID tables a wide berth when your tables are more
than few columns wide", and I don't think the Devs intended that either.

I can still roll with the idea that the WITHOUT ROWID tables with 151
columns and lots of data actually performed FASTER than the same ROWID
tables for the other queries but slower with the JOIN... but that much
slower?

I mean we are not talking a "little" slower, we are talking 50ms vs.
~70,000ms, that's a difference of a whopping 3 orders of magnitude and
change. And we are not talking huge tables, another query that simply
walks the tables can do so several million times in the same time the
JOIN query does.

Put another way, I can create 100 new tables and populate them each with
the rows from one test table, then delete the rows from each new table
that doesn't satisfy an EXISTS() check in the other test table
(essentially mimicking the JOIN query) and then output each full new
table, 100 of them in turn, and then DROP them all. SQlite can do ALL of
that in a fraction of the time that the normal JOIN query (between those
same two test tables) takes to complete.

We are talking a formula 1 car suddenly going at max speed of 1
mile-per-weekend, barely keeping up with a semi-athletic snail, and
people suggest checking the fuel octane rating. I'm saying there is
something wrong under the hood.

Cheers,
Ryan


On 2017/09/10 11:28 AM, Clemens Ladisch wrote:

> R Smith wrote:
>> I am using 151 columns for both tests. The only thing that changes
>> between the two scripts are the words "WITHOUT ROWID" being added
> <http://www.sqlite.org/withoutrowid.html> says:
> | WITHOUT ROWID tables will work correctly ... for tables with a single
> | INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
> | that case.
> and:
> | WITHOUT ROWID tables work best when individual rows are not too large.
>
> So this is pretty much the documented worst case for WITHOUT ROWID tables.
>
> If the query is executed by making a copy of all the table data into
> a temporary B-tree (index), which is what INTERSECT and probably IN(...)
> are doing, then the WITHOUT-ROWID-ness of the table does not matter.
>
>
> Regards,
> Clemens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

Dan Kennedy-4
On 09/10/2017 08:30 PM, R Smith wrote:

> Well yes but the documentation suggests that one could expect a slight
> degradation. The words "works best with" does not seem to imbue an
> idea of "give WITHOUT ROWID tables a wide berth when your tables are
> more than few columns wide", and I don't think the Devs intended that
> either.
>
> I can still roll with the idea that the WITHOUT ROWID tables with 151
> columns and lots of data actually performed FASTER than the same ROWID
> tables for the other queries but slower with the JOIN... but that much
> slower?



The difference seems to be that, currently, SQLite never creates
automatic indexes on WITHOUT ROWID tables.

   http://sqlite.org/optoverview.html#autoindex

I don't think there is a good reason that it cannot do so. It just
doesn't yet.

Dan.





>
> I mean we are not talking a "little" slower, we are talking 50ms vs.
> ~70,000ms, that's a difference of a whopping 3 orders of magnitude and
> change. And we are not talking huge tables, another query that simply
> walks the tables can do so several million times in the same time the
> JOIN query does.
>
> Put another way, I can create 100 new tables and populate them each
> with the rows from one test table, then delete the rows from each new
> table that doesn't satisfy an EXISTS() check in the other test table
> (essentially mimicking the JOIN query) and then output each full new
> table, 100 of them in turn, and then DROP them all. SQlite can do ALL
> of that in a fraction of the time that the normal JOIN query (between
> those same two test tables) takes to complete.
>
> We are talking a formula 1 car suddenly going at max speed of 1
> mile-per-weekend, barely keeping up with a semi-athletic snail, and
> people suggest checking the fuel octane rating. I'm saying there is
> something wrong under the hood.
>
> Cheers,
> Ryan
>
>
> On 2017/09/10 11:28 AM, Clemens Ladisch wrote:
>> R Smith wrote:
>>> I am using 151 columns for both tests. The only thing that changes
>>> between the two scripts are the words "WITHOUT ROWID" being added
>> <http://www.sqlite.org/withoutrowid.html> says:
>> | WITHOUT ROWID tables will work correctly ... for tables with a single
>> | INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
>> | that case.
>> and:
>> | WITHOUT ROWID tables work best when individual rows are not too large.
>>
>> So this is pretty much the documented worst case for WITHOUT ROWID
>> tables.
>>
>> If the query is executed by making a copy of all the table data into
>> a temporary B-tree (index), which is what INTERSECT and probably IN(...)
>> are doing, then the WITHOUT-ROWID-ness of the table does not matter.
>>
>>
>> Regards,
>> Clemens
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

J. King-3
There's an extra word in the first paragraph of Section 4 of that document, by the way:

" The error logger callback has also proven useful in catching errors occasional errors that the application misses..."

On September 11, 2017 11:22:50 AM EDT, Dan Kennedy <[hidden email]> wrote:

>On 09/10/2017 08:30 PM, R Smith wrote:
>> Well yes but the documentation suggests that one could expect a
>slight
>> degradation. The words "works best with" does not seem to imbue an
>> idea of "give WITHOUT ROWID tables a wide berth when your tables are
>> more than few columns wide", and I don't think the Devs intended that
>
>> either.
>>
>> I can still roll with the idea that the WITHOUT ROWID tables with 151
>
>> columns and lots of data actually performed FASTER than the same
>ROWID
>> tables for the other queries but slower with the JOIN... but that
>much
>> slower?
>
>
>
>The difference seems to be that, currently, SQLite never creates
>automatic indexes on WITHOUT ROWID tables.
>
>   http://sqlite.org/optoverview.html#autoindex
>
>I don't think there is a good reason that it cannot do so. It just
>doesn't yet.
>
>Dan.
>
>
>
>
>
>>
>> I mean we are not talking a "little" slower, we are talking 50ms vs.
>> ~70,000ms, that's a difference of a whopping 3 orders of magnitude
>and
>> change. And we are not talking huge tables, another query that simply
>
>> walks the tables can do so several million times in the same time the
>
>> JOIN query does.
>>
>> Put another way, I can create 100 new tables and populate them each
>> with the rows from one test table, then delete the rows from each new
>
>> table that doesn't satisfy an EXISTS() check in the other test table
>> (essentially mimicking the JOIN query) and then output each full new
>> table, 100 of them in turn, and then DROP them all. SQlite can do ALL
>
>> of that in a fraction of the time that the normal JOIN query (between
>
>> those same two test tables) takes to complete.
>>
>> We are talking a formula 1 car suddenly going at max speed of 1
>> mile-per-weekend, barely keeping up with a semi-athletic snail, and
>> people suggest checking the fuel octane rating. I'm saying there is
>> something wrong under the hood.
>>
>> Cheers,
>> Ryan
>>
>>
>> On 2017/09/10 11:28 AM, Clemens Ladisch wrote:
>>> R Smith wrote:
>>>> I am using 151 columns for both tests. The only thing that changes
>>>> between the two scripts are the words "WITHOUT ROWID" being added
>>> <http://www.sqlite.org/withoutrowid.html> says:
>>> | WITHOUT ROWID tables will work correctly ... for tables with a
>single
>>> | INTEGER PRIMARY KEY. However, ordinary rowid tables will run
>faster in
>>> | that case.
>>> and:
>>> | WITHOUT ROWID tables work best when individual rows are not too
>large.
>>>
>>> So this is pretty much the documented worst case for WITHOUT ROWID
>>> tables.
>>>
>>> If the query is executed by making a copy of all the table data into
>>> a temporary B-tree (index), which is what INTERSECT and probably
>IN(...)
>>> are doing, then the WITHOUT-ROWID-ness of the table does not matter.
>>>
>>>
>>> Regards,
>>> Clemens
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

J. King-3
Doh. I meant the linked document on the error log. Silly me.

<http://sqlite.org/errlog.html>

On September 11, 2017 9:41:39 PM EDT, "J. King" <[hidden email]> wrote:

>There's an extra word in the first paragraph of Section 4 of that
>document, by the way:
>
>" The error logger callback has also proven useful in catching errors
>occasional errors that the application misses..."
>
>On September 11, 2017 11:22:50 AM EDT, Dan Kennedy
><[hidden email]> wrote:
>>On 09/10/2017 08:30 PM, R Smith wrote:
>>> Well yes but the documentation suggests that one could expect a
>>slight
>>> degradation. The words "works best with" does not seem to imbue an
>>> idea of "give WITHOUT ROWID tables a wide berth when your tables are
>
>>> more than few columns wide", and I don't think the Devs intended
>that
>>
>>> either.
>>>
>>> I can still roll with the idea that the WITHOUT ROWID tables with
>151
>>
>>> columns and lots of data actually performed FASTER than the same
>>ROWID
>>> tables for the other queries but slower with the JOIN... but that
>>much
>>> slower?
>>
>>
>>
>>The difference seems to be that, currently, SQLite never creates
>>automatic indexes on WITHOUT ROWID tables.
>>
>>   http://sqlite.org/optoverview.html#autoindex
>>
>>I don't think there is a good reason that it cannot do so. It just
>>doesn't yet.
>>
>>Dan.
>>
>>
>>
>>
>>
>>>
>>> I mean we are not talking a "little" slower, we are talking 50ms vs.
>
>>> ~70,000ms, that's a difference of a whopping 3 orders of magnitude
>>and
>>> change. And we are not talking huge tables, another query that
>simply
>>
>>> walks the tables can do so several million times in the same time
>the
>>
>>> JOIN query does.
>>>
>>> Put another way, I can create 100 new tables and populate them each
>>> with the rows from one test table, then delete the rows from each
>new
>>
>>> table that doesn't satisfy an EXISTS() check in the other test table
>
>>> (essentially mimicking the JOIN query) and then output each full new
>
>>> table, 100 of them in turn, and then DROP them all. SQlite can do
>ALL
>>
>>> of that in a fraction of the time that the normal JOIN query
>(between
>>
>>> those same two test tables) takes to complete.
>>>
>>> We are talking a formula 1 car suddenly going at max speed of 1
>>> mile-per-weekend, barely keeping up with a semi-athletic snail, and
>>> people suggest checking the fuel octane rating. I'm saying there is
>>> something wrong under the hood.
>>>
>>> Cheers,
>>> Ryan
>>>
>>>
>>> On 2017/09/10 11:28 AM, Clemens Ladisch wrote:
>>>> R Smith wrote:
>>>>> I am using 151 columns for both tests. The only thing that changes
>>>>> between the two scripts are the words "WITHOUT ROWID" being added
>>>> <http://www.sqlite.org/withoutrowid.html> says:
>>>> | WITHOUT ROWID tables will work correctly ... for tables with a
>>single
>>>> | INTEGER PRIMARY KEY. However, ordinary rowid tables will run
>>faster in
>>>> | that case.
>>>> and:
>>>> | WITHOUT ROWID tables work best when individual rows are not too
>>large.
>>>>
>>>> So this is pretty much the documented worst case for WITHOUT ROWID
>>>> tables.
>>>>
>>>> If the query is executed by making a copy of all the table data
>into
>>>> a temporary B-tree (index), which is what INTERSECT and probably
>>IN(...)
>>>> are doing, then the WITHOUT-ROWID-ness of the table does not
>matter.
>>>>
>>>>
>>>> Regards,
>>>> Clemens
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>>
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>>_______________________________________________
>>sqlite-users mailing list
>>[hidden email]
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>--
>Sent from my Android device with K-9 Mail. Please excuse my brevity.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12