Why the high cost of a double sort

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Why the high cost of a double sort

Cecil Westerhof-5
I have the following query:
    SELECT used
    FROM   usedProverbs
    LIMIT  1

The view useProverbs is defined as:
    CREATE VIEW usedProverbs AS
    SELECT   *
    FROM     proverbs
    WHERE    CAST(used AS INT) <> 0
    ORDER BY used ASC

But I am told this is not clear and that I should use:
    SELECT   used
    FROM     usedProverbs
    ORDER BY used
    LIMIT    1

But when I use those in DBBrowser, the first takes almost always 0 ms,
while the second takes between 13 and 16 ms. Why does the second one take
so much more time?

The definition of proverbs:
    CREATE TABLE proverbs (
        proverbID   INTEGER PRIMARY KEY AUTOINCREMENT,
        proverb     TEXT    NOT NULL UNIQUE,
        used        TEXT    DEFAULT 'notUsed'
    );

​To be sure I also executed it in sqlite3 after .timer on. Here the first
one almost always gives​:
    Run Time: real 0.000 user 0.000000 sys 0.000000
and sometimes:
    Run Time: real 0.001 user 0.000000 sys 0.000000

The second one gives mostly:
    Run Time: real 0.006 user 0.004000 sys 0.000000
but sometimes it get as low as:
    Run Time: real 0.003 user 0.004000 sys 0.000000

​Both are using 3.16.2​. Would 3.20.0 make a difference? Then I could
install that one and see the results from that. (The JDBC uses 3.20.2.)

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why the high cost of a double sort

Clemens Ladisch
Cecil Westerhof wrote:

> I have the following query:
>     SELECT used
>     FROM   usedProverbs
>     LIMIT  1
>
> The view useProverbs is defined as:
>     CREATE VIEW usedProverbs AS
>     SELECT   *
>     FROM     proverbs
>     WHERE    CAST(used AS INT) <> 0
>     ORDER BY used ASC
>
> But I am told this is not clear and that I should use:
>     SELECT   used
>     FROM     usedProverbs
>     ORDER BY used
>     LIMIT    1
>
> But when I use those in DBBrowser, the first takes almost always 0 ms,
> while the second takes between 13 and 16 ms. Why does the second one take
> so much more time?

Because only the first sort can be optimized away with an index
(which you did not mention).


If you believe what you're told (that ORDER BY must be in the
outermost query), then remove the ORDER BY from the view.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why the high cost of a double sort

Keith Medcalf
In reply to this post by Cecil Westerhof-5

When you run the second query, the optimizer does not know that the data returned from the view is already sorted into the "correct" order and the inclusion of the "order by" in the view prevents the query flattener from optimizing out (re-writing) the query into a single query (eliminating the view).  Without flattening, the view is simply a "black box" that returns some rows in an arbitrary (completely unsorted) order.

If you remove the "order by" from the view, then the query flattener will be able to flatten the query and re-write and execute as if you had entered:

  SELECT used
    FROM proverbs
   WHERE CAST(used AS INT) <> 0
ORDER BY used
   LIMIT 1

This of course depends on the version of SQLite because the capabilities of flattening views varied a bit from version to version.

See section 10 of https://sqlite.org/optoverview.html
for a long list of things that affect the query flattening/re-write capability.

Your query with a view is equivalent to

SELECT used FROM (SELECT * FROM proverbs WHERE CAST(used AS INT) <> 0 ORDER BY used ASC) ORDER BY used LIMIT 1

As you can see from the list, the subquery ORDER BY precludes flattening the query.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Cecil Westerhof
>Sent: Saturday, 19 August, 2017 05:37
>To: SQLite mailing list
>Subject: [sqlite] Why the high cost of a double sort
>
>I have the following query:
>    SELECT used
>    FROM   usedProverbs
>    LIMIT  1
>
>The view useProverbs is defined as:
>    CREATE VIEW usedProverbs AS
>    SELECT   *
>    FROM     proverbs
>    WHERE    CAST(used AS INT) <> 0
>    ORDER BY used ASC
>
>But I am told this is not clear and that I should use:
>    SELECT   used
>    FROM     usedProverbs
>    ORDER BY used
>    LIMIT    1
>
>But when I use those in DBBrowser, the first takes almost always 0
>ms,
>while the second takes between 13 and 16 ms. Why does the second one
>take
>so much more time?
>
>The definition of proverbs:
>    CREATE TABLE proverbs (
>        proverbID   INTEGER PRIMARY KEY AUTOINCREMENT,
>        proverb     TEXT    NOT NULL UNIQUE,
>        used        TEXT    DEFAULT 'notUsed'
>    );
>
>​To be sure I also executed it in sqlite3 after .timer on. Here the
>first
>one almost always gives​:
>    Run Time: real 0.000 user 0.000000 sys 0.000000
>and sometimes:
>    Run Time: real 0.001 user 0.000000 sys 0.000000
>
>The second one gives mostly:
>    Run Time: real 0.006 user 0.004000 sys 0.000000
>but sometimes it get as low as:
>    Run Time: real 0.003 user 0.004000 sys 0.000000
>
>​Both are using 3.16.2​. Would 3.20.0 make a difference? Then I could
>install that one and see the results from that. (The JDBC uses
>3.20.2.)
>
>--
>Cecil Westerhof
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why the high cost of a double sort

Cecil Westerhof-5
In reply to this post by Clemens Ladisch
2017-08-19 20:37 GMT+02:00 Clemens Ladisch <[hidden email]>:

> Cecil Westerhof wrote:
> > I have the following query:
> >     SELECT used
> >     FROM   usedProverbs
> >     LIMIT  1
> >
> > The view useProverbs is defined as:
> >     CREATE VIEW usedProverbs AS
> >     SELECT   *
> >     FROM     proverbs
> >     WHERE    CAST(used AS INT) <> 0
> >     ORDER BY used ASC
> >
> > But I am told this is not clear and that I should use:
> >     SELECT   used
> >     FROM     usedProverbs
> >     ORDER BY used
> >     LIMIT    1
> >
> > But when I use those in DBBrowser, the first takes almost always 0 ms,
> > while the second takes between 13 and 16 ms. Why does the second one take
> > so much more time?
>
> Because only the first sort can be optimized away with an index
> (which you did not mention).
>

​I see that there is an index on used yes:
    CREATE INDEX `proverbs_used_idx` ON `proverbs` (`used` )

That explains it. It has been some time I defined the table and forgot it.




> If you believe what you're told (that ORDER BY must be in the
> outermost query), then remove the ORDER BY from the view.
>

​I did and that worked.

I was also told that you never should put a sort on a view. Is that true,
or a bit to strong?


--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why the high cost of a double sort

Cecil Westerhof-5
In reply to this post by Keith Medcalf
2017-08-19 22:03 GMT+02:00 Keith Medcalf <[hidden email]>:

>
> When you run the second query, the optimizer does not know that the data
> returned from the view is already sorted into the "correct" order and the
> inclusion of the "order by" in the view prevents the query flattener from
> optimizing out (re-writing) the query into a single query (eliminating the
> view).  Without flattening, the view is simply a "black box" that returns
> some rows in an arbitrary (completely unsorted) order.
>
> If you remove the "order by" from the view, then the query flattener will
> be able to flatten the query and re-write and execute as if you had entered:
>
>   SELECT used
>     FROM proverbs
>    WHERE CAST(used AS INT) <> 0
> ORDER BY used
>    LIMIT 1
>

​Thanks. I learned something.

By the way, I should have used:
    SELECT MIN(used)
    FROM   usedProverbs

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why the high cost of a double sort

Simon Slavin-3
In reply to this post by Cecil Westerhof-5


On 19 Aug 2017, at 10:48pm, Cecil Westerhof <[hidden email]> wrote:

> I was also told that you never should put a sort on a view. Is that true,
> or a bit to strong?

Generally, you put the ORDER BY on the SELECT you’re using the consult the VIEW.  Technically speaking a VIEW is just a set of records and the ORDER BY should be a last-minute thing just before presentation of the results.

But it can be useful for someone submitting commands manually to create a VIEW with ORDER BY.  For instance, using the command-line shell to investigate weird results, or for those cases where you’re using SQLite like a spreadsheet to arrive at a one-off result.  So I think an outright ban is a little strong.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why the high cost of a double sort

Clemens Ladisch
Simon Slavin wrote:
> On 19 Aug 2017, at 10:48pm, Cecil Westerhof wrote:
>> I was also told that you never should put a sort on a view. Is that true,
>> or a bit to strong?
>
> Generally, you put the ORDER BY on the SELECT you’re using the consult
> the VIEW.  Technically speaking a VIEW is just a set of records and the
> ORDER BY should be a last-minute thing just before presentation of the
> results.

The SQL standard and <http://www.sqlite.org/lang_select.html#orderby> say:
| If a SELECT statement that returns more than one row does not have an
| ORDER BY clause, the order in which the rows are returned is undefined.

ORDER BY clauses in any subqueries/views/CTEs are not guaranteed to keep
the order in the outermost query.  It's possible for the database to
optimize the query by using some index or cache, and that can result in
a different order.

> But it can be useful for someone submitting commands manually to create
> a VIEW with ORDER BY.  For instance, using the command-line shell to
> investigate weird results, or for those cases where you’re using SQLite
> like a spreadsheet to arrive at a one-off result.  So I think an outright
> ban is a little strong.

In practice, when the outer query is implemented by simply scanning over
the inner query's rows, the order will not change.  But there is no
guarantee that the implementation will always stay the same.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users