Query planner: Scanning subqueries vs using automatic covering index

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Query planner: Scanning subqueries vs using automatic covering index

niklas
I recently noticed some very slow queries for my sqlite book database.
Turns out that it happened due to the query planner decided to scan
sub-queries instead of using an automatic covering index to search them.

The database contains about 3000 entries and with two subqueries it
took a noticable time to run, with three subqueries I had to give up and
abort to save the CPU from overheating.

Using indexes only take milliseconds in all cases.

Dropping analyze (sqlite_stat1) information from the database made
the problem go away, i.e. made sqlite use index instead of scanning.
As did adding a few more columns to the ORDER BY part.

These sql statements should be enough to demonstrate the problem.
I am using sqlite 3.25.1

**********
create.txt
**********

BEGIN;

CREATE TABLE "Authors" (
    "AuthorID" INTEGER PRIMARY KEY,
    "Last Name" TEXT NOT NULL,
    "First Name" TEXT NOT NULL);

CREATE TABLE "Books" (
    "BookID" INTEGER PRIMARY KEY,
    "Title" TEXT NOT NULL);
 
CREATE TABLE "Genres"(
    "GenreID" INTEGER PRIMARY KEY,
    "Genre" TEXT UNIQUE NOT NULL);

CREATE TABLE "DatesRead"(
    "BookID" INTEGER,
    "Date Read" TEXT,
    PRIMARY KEY("BookID", "Date Read"),
    FOREIGN KEY(BookID) REFERENCES Books(BookID)) WITHOUT ROWID;

CREATE TABLE "AuthorBooks"(
    "BookID" INTEGER,
    "AuthorID" INTEGER,
    PRIMARY KEY("BookID", "AuthorID" ),
    FOREIGN KEY(BookID) REFERENCES Books(BookID),
    FOREIGN KEY(AuthorID) REFERENCES Authors(AuthorID)) WITHOUT ROWID;

CREATE TABLE "BookGenres"(
    "BookID" INTEGER,
    "GenreID" INTEGER,  
    PRIMARY KEY("BookID", "GenreID" ),
    FOREIGN KEY(BookID) REFERENCES Books(BookID),
    FOREIGN KEY(GenreID) REFERENCES Genres(GenreID)) WITHOUT ROWID;

ANALYZE;
INSERT INTO sqlite_stat1 VALUES('BookGenres','BookGenres','3190 2 1');
INSERT INTO sqlite_stat1 VALUES('DatesRead','DatesRead','3047 2 1');
INSERT INTO sqlite_stat1 VALUES('AuthorBooks','AuthorBooks','3549 2 1');
INSERT INTO sqlite_stat1 VALUES('Authors',NULL,'1329');
INSERT INTO sqlite_stat1 VALUES('Books',NULL,'2978');
INSERT INTO sqlite_stat1 VALUES('Genres','sqlite_autoindex_Genres_1','112
1');

COMMIT;

************
queries.txt
************

select "--------------------";
select "order by bookid only";
select "--------------------";

EXPLAIN QUERY PLAN
SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
FROM Books
JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last Name"),',
') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN Authors
USING(AuthorID) GROUP BY BookID) USING(BookID)
JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)' FROM Books
JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' FROM Books JOIN
BookGenres USING(BookID) JOIN Genres USING(GenreID) GROUP BY BookID)
USING(BookID)
ORDER BY BookID;

select "-------------------------------------------";
select "order by bookid and some other fields too";
select "------------------------------------------";

EXPLAIN QUERY PLAN
SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
FROM Books
JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last Name"),',
') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN Authors
USING(AuthorID) GROUP BY BookID) USING(BookID)
JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)' FROM Books
JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' FROM Books JOIN
BookGenres USING(BookID) JOIN Genres USING(GenreID) GROUP BY BookID)
USING(BookID)
ORDER BY BookID, Title, "Author(s)", "Date(s)";


************
Test run:
************

del test.db
type create.txt | sqlite3 test.db
type queries.txt | sqlite3 test.db


************
Output:
************

--------------------
order by bookid only
--------------------
QUERY PLAN
|--MATERIALIZE 1
|  |--SCAN TABLE Books
|  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=?)
|  `--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
|--MATERIALIZE 2
|  |--SCAN TABLE Books
|  `--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
|--MATERIALIZE 3
|  |--SCAN TABLE Books
|  |--SEARCH TABLE BookGenres USING PRIMARY KEY (BookID=?)
|  `--SEARCH TABLE Genres USING INTEGER PRIMARY KEY (rowid=?)
|--SCAN SUBQUERY 1
|--SCAN SUBQUERY 2
|--SCAN SUBQUERY 3
|--SEARCH TABLE Books USING INTEGER PRIMARY KEY (rowid=?)
`--USE TEMP B-TREE FOR ORDER BY
-------------------------------------------
order by bookid and some other fields too
------------------------------------------
QUERY PLAN
|--MATERIALIZE 1
|  |--SCAN TABLE Books
|  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=?)
|  `--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
|--MATERIALIZE 2
|  |--SCAN TABLE Books
|  `--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
|--MATERIALIZE 3
|  |--SCAN TABLE Books
|  |--SEARCH TABLE BookGenres USING PRIMARY KEY (BookID=?)
|  `--SEARCH TABLE Genres USING INTEGER PRIMARY KEY (rowid=?)
|--SCAN SUBQUERY 1
|--SEARCH TABLE Books USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX (BookID=?)
|--SEARCH SUBQUERY 3 USING AUTOMATIC COVERING INDEX (BookID=?)
`--USE TEMP B-TREE FOR ORDER BY


As can be seen the first query (when analyze information is in place)
will use three nested scans as opposed to the second query with more
order by columns that instead uses indexes to search.

Now, might this be a weakness in the query planner, or is there some
precautions I could/should always take to avoid the scanning?






--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

Simon Slavin-3
On 15 Mar 2019, at 7:35am, niklas <[hidden email]> wrote:

> Dropping analyze (sqlite_stat1) information from the database made
> the problem go away, i.e. made sqlite use index instead of scanning.

Instead of dropping ANALYZE information, did you try updating it instead, by running ANALYZE again ?

Ah ...

> ANALYZE;
> INSERT INTO sqlite_stat1 VALUES('BookGenres','BookGenres','3190 2 1');

How do we know your values for sqlite_stat1 are right ?  Why aren't you just leaving ANALZE to do its job ?

Also, you are missing out on part of what ANALYZE does.  As well as looking at your schema, ANALYZE also considers the 'chunkiness' of the data in each indexed column.  Run it with data in the table, with the data as convincing as possible in terms of number of rows and the values in each column.

In your create.txt I would insert data in your tables, then put ANALYZE just before COMMIT.

Curious to know if this improves matters for you.  I'm not certain.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

niklas
The data used for sqlite_stat1 in create.txt is taken from the real data,
it's copied from the sql-dump generated just after running ANALYZE.

I only wanted to include the minimum amount of data the demonstrate the
issue so I omitted all other tables, views and data. As I understand it
sqlite only checks the statN table data when planning the queries and not
the actual data in the real tables.

I have dropped the sqlite_stat1 table for now and so disabled analyze
functionality. This way I avoid the slow queries. Running ANALYZE on the
database again will immediately make them return though.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

Keith Medcalf
In reply to this post by niklas

The current tip of trunk produced the same results demonstrated by Niklas in his original post for his original testcase for me.

I would have written the query as a simple query with correlated subqueries to get the concatenated data as it is a "more natural" declaration of what is wanted (in my opinion anyway), rather than re-writing into joins.

  select BookID,
         Title,
         (
           SELECT group_concat(ltrim("First Name" || ' ' || "Last Name"),',')
             FROM AuthorBooks JOIN Authors USING (AuthorID)
            WHERE BookID == Books.BookID
         ) as "Author(s)",
         (
           SELECT group_concat("Date read",', ')
             FROM DatesRead
            WHERE BookID == Books.BookID
         ) as "Date(s)",
         (
           SELECT group_concat(Genre,', ')
             FROM BookGenres JOIN Genres USING (GenreID)
            WHERE BookID == Books.BookID
         ) AS "Genre(s)"
    FROM Books
ORDER BY BookID;

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of niklas
>Sent: Friday, 15 March, 2019 01:36
>To: [hidden email]
>Subject: [sqlite] Query planner: Scanning subqueries vs using
>automatic covering index
>
>I recently noticed some very slow queries for my sqlite book
>database.
>Turns out that it happened due to the query planner decided to scan
>sub-queries instead of using an automatic covering index to search
>them.
>
>The database contains about 3000 entries and with two subqueries it
>took a noticable time to run, with three subqueries I had to give up
>and
>abort to save the CPU from overheating.
>
>Using indexes only take milliseconds in all cases.
>
>Dropping analyze (sqlite_stat1) information from the database made
>the problem go away, i.e. made sqlite use index instead of scanning.
>As did adding a few more columns to the ORDER BY part.
>
>These sql statements should be enough to demonstrate the problem.
>I am using sqlite 3.25.1
>
>**********
>create.txt
>**********
>
>BEGIN;
>
>CREATE TABLE "Authors" (
>    "AuthorID" INTEGER PRIMARY KEY,
>    "Last Name" TEXT NOT NULL,
>    "First Name" TEXT NOT NULL);
>
>CREATE TABLE "Books" (
>    "BookID" INTEGER PRIMARY KEY,
>    "Title" TEXT NOT NULL);
>
>CREATE TABLE "Genres"(
>    "GenreID" INTEGER PRIMARY KEY,
>    "Genre" TEXT UNIQUE NOT NULL);
>
>CREATE TABLE "DatesRead"(
>    "BookID" INTEGER,
>    "Date Read" TEXT,
>    PRIMARY KEY("BookID", "Date Read"),
>    FOREIGN KEY(BookID) REFERENCES Books(BookID)) WITHOUT ROWID;
>
>CREATE TABLE "AuthorBooks"(
>    "BookID" INTEGER,
>    "AuthorID" INTEGER,
>    PRIMARY KEY("BookID", "AuthorID" ),
>    FOREIGN KEY(BookID) REFERENCES Books(BookID),
>    FOREIGN KEY(AuthorID) REFERENCES Authors(AuthorID)) WITHOUT
>ROWID;
>
>CREATE TABLE "BookGenres"(
>    "BookID" INTEGER,
>    "GenreID" INTEGER,
>    PRIMARY KEY("BookID", "GenreID" ),
>    FOREIGN KEY(BookID) REFERENCES Books(BookID),
>    FOREIGN KEY(GenreID) REFERENCES Genres(GenreID)) WITHOUT ROWID;
>
>ANALYZE;
>INSERT INTO sqlite_stat1 VALUES('BookGenres','BookGenres','3190 2
>1');
>INSERT INTO sqlite_stat1 VALUES('DatesRead','DatesRead','3047 2 1');
>INSERT INTO sqlite_stat1 VALUES('AuthorBooks','AuthorBooks','3549 2
>1');
>INSERT INTO sqlite_stat1 VALUES('Authors',NULL,'1329');
>INSERT INTO sqlite_stat1 VALUES('Books',NULL,'2978');
>INSERT INTO sqlite_stat1
>VALUES('Genres','sqlite_autoindex_Genres_1','112
>1');
>
>COMMIT;
>
>************
>queries.txt
>************
>
>select "--------------------";
>select "order by bookid only";
>select "--------------------";
>
>EXPLAIN QUERY PLAN
>SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
>FROM Books
>JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last
>Name"),',
>') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN
>Authors
>USING(AuthorID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)' FROM
>Books
>JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' FROM
>Books JOIN
>BookGenres USING(BookID) JOIN Genres USING(GenreID) GROUP BY BookID)
>USING(BookID)
>ORDER BY BookID;
>
>select "-------------------------------------------";
>select "order by bookid and some other fields too";
>select "------------------------------------------";
>
>EXPLAIN QUERY PLAN
>SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
>FROM Books
>JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last
>Name"),',
>') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN
>Authors
>USING(AuthorID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)' FROM
>Books
>JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' FROM
>Books JOIN
>BookGenres USING(BookID) JOIN Genres USING(GenreID) GROUP BY BookID)
>USING(BookID)
>ORDER BY BookID, Title, "Author(s)", "Date(s)";
>
>
>************
>Test run:
>************
>
>del test.db
>type create.txt | sqlite3 test.db
>type queries.txt | sqlite3 test.db
>
>
>************
>Output:
>************
>
>--------------------
>order by bookid only
>--------------------
>QUERY PLAN
>|--MATERIALIZE 1
>|  |--SCAN TABLE Books
>|  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=?)
>|  `--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
>|--MATERIALIZE 2
>|  |--SCAN TABLE Books
>|  `--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
>|--MATERIALIZE 3
>|  |--SCAN TABLE Books
>|  |--SEARCH TABLE BookGenres USING PRIMARY KEY (BookID=?)
>|  `--SEARCH TABLE Genres USING INTEGER PRIMARY KEY (rowid=?)
>|--SCAN SUBQUERY 1
>|--SCAN SUBQUERY 2
>|--SCAN SUBQUERY 3
>|--SEARCH TABLE Books USING INTEGER PRIMARY KEY (rowid=?)
>`--USE TEMP B-TREE FOR ORDER BY
>-------------------------------------------
>order by bookid and some other fields too
>------------------------------------------
>QUERY PLAN
>|--MATERIALIZE 1
>|  |--SCAN TABLE Books
>|  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=?)
>|  `--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
>|--MATERIALIZE 2
>|  |--SCAN TABLE Books
>|  `--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
>|--MATERIALIZE 3
>|  |--SCAN TABLE Books
>|  |--SEARCH TABLE BookGenres USING PRIMARY KEY (BookID=?)
>|  `--SEARCH TABLE Genres USING INTEGER PRIMARY KEY (rowid=?)
>|--SCAN SUBQUERY 1
>|--SEARCH TABLE Books USING INTEGER PRIMARY KEY (rowid=?)
>|--SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX (BookID=?)
>|--SEARCH SUBQUERY 3 USING AUTOMATIC COVERING INDEX (BookID=?)
>`--USE TEMP B-TREE FOR ORDER BY
>
>
>As can be seen the first query (when analyze information is in place)
>will use three nested scans as opposed to the second query with more
>order by columns that instead uses indexes to search.
>
>Now, might this be a weakness in the query planner, or is there some
>precautions I could/should always take to avoid the scanning?
>
>
>
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

Keith Medcalf

Of course, the correlated version returns all books, not just the ones that have at least one author, were read at least once, and have at least one genre.  

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Keith Medcalf
>Sent: Friday, 15 March, 2019 14:44
>To: SQLite mailing list
>Subject: Re: [sqlite] Query planner: Scanning subqueries vs using
>automatic covering index
>
>
>The current tip of trunk produced the same results demonstrated by
>Niklas in his original post for his original testcase for me.
>
>I would have written the query as a simple query with correlated
>subqueries to get the concatenated data as it is a "more natural"
>declaration of what is wanted (in my opinion anyway), rather than re-
>writing into joins.
>
>  select BookID,
>         Title,
>         (
>           SELECT group_concat(ltrim("First Name" || ' ' || "Last
>Name"),',')
>             FROM AuthorBooks JOIN Authors USING (AuthorID)
>            WHERE BookID == Books.BookID
>         ) as "Author(s)",
>         (
>           SELECT group_concat("Date read",', ')
>             FROM DatesRead
>            WHERE BookID == Books.BookID
>         ) as "Date(s)",
>         (
>           SELECT group_concat(Genre,', ')
>             FROM BookGenres JOIN Genres USING (GenreID)
>            WHERE BookID == Books.BookID
>         ) AS "Genre(s)"
>    FROM Books
>ORDER BY BookID;
>
>---
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-----Original Message-----
>>From: sqlite-users [mailto:sqlite-users-
>>[hidden email]] On Behalf Of niklas
>>Sent: Friday, 15 March, 2019 01:36
>>To: [hidden email]
>>Subject: [sqlite] Query planner: Scanning subqueries vs using
>>automatic covering index
>>
>>I recently noticed some very slow queries for my sqlite book
>>database.
>>Turns out that it happened due to the query planner decided to scan
>>sub-queries instead of using an automatic covering index to search
>>them.
>>
>>The database contains about 3000 entries and with two subqueries it
>>took a noticable time to run, with three subqueries I had to give up
>>and
>>abort to save the CPU from overheating.
>>
>>Using indexes only take milliseconds in all cases.
>>
>>Dropping analyze (sqlite_stat1) information from the database made
>>the problem go away, i.e. made sqlite use index instead of scanning.
>>As did adding a few more columns to the ORDER BY part.
>>
>>These sql statements should be enough to demonstrate the problem.
>>I am using sqlite 3.25.1
>>
>>**********
>>create.txt
>>**********
>>
>>BEGIN;
>>
>>CREATE TABLE "Authors" (
>>    "AuthorID" INTEGER PRIMARY KEY,
>>    "Last Name" TEXT NOT NULL,
>>    "First Name" TEXT NOT NULL);
>>
>>CREATE TABLE "Books" (
>>    "BookID" INTEGER PRIMARY KEY,
>>    "Title" TEXT NOT NULL);
>>
>>CREATE TABLE "Genres"(
>>    "GenreID" INTEGER PRIMARY KEY,
>>    "Genre" TEXT UNIQUE NOT NULL);
>>
>>CREATE TABLE "DatesRead"(
>>    "BookID" INTEGER,
>>    "Date Read" TEXT,
>>    PRIMARY KEY("BookID", "Date Read"),
>>    FOREIGN KEY(BookID) REFERENCES Books(BookID)) WITHOUT ROWID;
>>
>>CREATE TABLE "AuthorBooks"(
>>    "BookID" INTEGER,
>>    "AuthorID" INTEGER,
>>    PRIMARY KEY("BookID", "AuthorID" ),
>>    FOREIGN KEY(BookID) REFERENCES Books(BookID),
>>    FOREIGN KEY(AuthorID) REFERENCES Authors(AuthorID)) WITHOUT
>>ROWID;
>>
>>CREATE TABLE "BookGenres"(
>>    "BookID" INTEGER,
>>    "GenreID" INTEGER,
>>    PRIMARY KEY("BookID", "GenreID" ),
>>    FOREIGN KEY(BookID) REFERENCES Books(BookID),
>>    FOREIGN KEY(GenreID) REFERENCES Genres(GenreID)) WITHOUT ROWID;
>>
>>ANALYZE;
>>INSERT INTO sqlite_stat1 VALUES('BookGenres','BookGenres','3190 2
>>1');
>>INSERT INTO sqlite_stat1 VALUES('DatesRead','DatesRead','3047 2 1');
>>INSERT INTO sqlite_stat1 VALUES('AuthorBooks','AuthorBooks','3549 2
>>1');
>>INSERT INTO sqlite_stat1 VALUES('Authors',NULL,'1329');
>>INSERT INTO sqlite_stat1 VALUES('Books',NULL,'2978');
>>INSERT INTO sqlite_stat1
>>VALUES('Genres','sqlite_autoindex_Genres_1','112
>>1');
>>
>>COMMIT;
>>
>>************
>>queries.txt
>>************
>>
>>select "--------------------";
>>select "order by bookid only";
>>select "--------------------";
>>
>>EXPLAIN QUERY PLAN
>>SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
>>FROM Books
>>JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last
>>Name"),',
>>') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN
>>Authors
>>USING(AuthorID) GROUP BY BookID) USING(BookID)
>>JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)'
>FROM
>>Books
>>JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
>>JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' FROM
>>Books JOIN
>>BookGenres USING(BookID) JOIN Genres USING(GenreID) GROUP BY BookID)
>>USING(BookID)
>>ORDER BY BookID;
>>
>>select "-------------------------------------------";
>>select "order by bookid and some other fields too";
>>select "------------------------------------------";
>>
>>EXPLAIN QUERY PLAN
>>SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
>>FROM Books
>>JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last
>>Name"),',
>>') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN
>>Authors
>>USING(AuthorID) GROUP BY BookID) USING(BookID)
>>JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)'
>FROM
>>Books
>>JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
>>JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' FROM
>>Books JOIN
>>BookGenres USING(BookID) JOIN Genres USING(GenreID) GROUP BY BookID)
>>USING(BookID)
>>ORDER BY BookID, Title, "Author(s)", "Date(s)";
>>
>>
>>************
>>Test run:
>>************
>>
>>del test.db
>>type create.txt | sqlite3 test.db
>>type queries.txt | sqlite3 test.db
>>
>>
>>************
>>Output:
>>************
>>
>>--------------------
>>order by bookid only
>>--------------------
>>QUERY PLAN
>>|--MATERIALIZE 1
>>|  |--SCAN TABLE Books
>>|  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=?)
>>|  `--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
>>|--MATERIALIZE 2
>>|  |--SCAN TABLE Books
>>|  `--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
>>|--MATERIALIZE 3
>>|  |--SCAN TABLE Books
>>|  |--SEARCH TABLE BookGenres USING PRIMARY KEY (BookID=?)
>>|  `--SEARCH TABLE Genres USING INTEGER PRIMARY KEY (rowid=?)
>>|--SCAN SUBQUERY 1
>>|--SCAN SUBQUERY 2
>>|--SCAN SUBQUERY 3
>>|--SEARCH TABLE Books USING INTEGER PRIMARY KEY (rowid=?)
>>`--USE TEMP B-TREE FOR ORDER BY
>>-------------------------------------------
>>order by bookid and some other fields too
>>------------------------------------------
>>QUERY PLAN
>>|--MATERIALIZE 1
>>|  |--SCAN TABLE Books
>>|  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=?)
>>|  `--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
>>|--MATERIALIZE 2
>>|  |--SCAN TABLE Books
>>|  `--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
>>|--MATERIALIZE 3
>>|  |--SCAN TABLE Books
>>|  |--SEARCH TABLE BookGenres USING PRIMARY KEY (BookID=?)
>>|  `--SEARCH TABLE Genres USING INTEGER PRIMARY KEY (rowid=?)
>>|--SCAN SUBQUERY 1
>>|--SEARCH TABLE Books USING INTEGER PRIMARY KEY (rowid=?)
>>|--SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX (BookID=?)
>>|--SEARCH SUBQUERY 3 USING AUTOMATIC COVERING INDEX (BookID=?)
>>`--USE TEMP B-TREE FOR ORDER BY
>>
>>
>>As can be seen the first query (when analyze information is in
>place)
>>will use three nested scans as opposed to the second query with more
>>order by columns that instead uses indexes to search.
>>
>>Now, might this be a weakness in the query planner, or is there some
>>precautions I could/should always take to avoid the scanning?
>>
>>
>>
>>
>>
>>
>>--
>>Sent from: http://sqlite.1065341.n5.nabble.com/
>>_______________________________________________
>>sqlite-users mailing list
>>[hidden email]
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

Simon Slavin-3
In reply to this post by niklas
On 15 Mar 2019, at 7:02pm, niklas <[hidden email]> wrote:

> The data used for sqlite_stat1 in create.txt is taken from the real data, it's copied from the sql-dump generated just after running ANALYZE.

Okay.  I should have guessed that.  Sorry for doubting you.

You seem to have figured out a work-around for now.  Good luck with it.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

niklas
In reply to this post by Keith Medcalf
I agree that correlated subqueries in general seem more natural and are
probably also less likely to have the performance pessimizations noticed
with joins.

But I might also want to use the column, or in case of a correlated
subquery, the column alias, in the WHERE clause and previously that has not
been supported as I recall. Maybe also not allowed by SQL specification
itself?

I modified the suggested query and used the column alias in WHERE now
though, and it seemed to work!

Is this a recent change in Sqlite or have I misunderstood something? The
Sqlite documentation still does not seem to say that column aliases can be
used in the WHERE clause at least.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

Simon Slavin-3
On 17 Mar 2019, at 5:19pm, niklas <[hidden email]> wrote:

> Is this a recent change in Sqlite or have I misunderstood something? The Sqlite documentation still does not seem to say that column aliases can be used in the WHERE clause at least.

You are correct in two things.  Column aliases cannot be relied on inside the WHERE clause or any other clause.  You should imagine that they are used only when the SELECT statement is ready to return values.

And also that some implementations of SQL allow the 'AS' names to be used for other purposes, but this is not in the SQL specification.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

Keith Medcalf
In reply to this post by niklas

On Sunday, 17 March, 2019 11:19, niklas <[hidden email]> wrote:

>I agree that correlated subqueries in general seem more natural and
>are probably also less likely to have the performance pessimizations
>noticed with joins.

>But I might also want to use the column, or in case of a correlated
>subquery, the column alias, in the WHERE clause and previously that
>has not been supported as I recall. Maybe also not allowed by SQL
>specification itself?

>I modified the suggested query and used the column alias in WHERE now
>though, and it seemed to work!

Yes and no and it "seemed to work" is an adequate description.  You have to realize that the list of things to get (the SELECT arguments) are executed only after the WHERE clause is executed.  That is to say, the semantics of the SELECT statement is:

SELECT some stuff
  FROM <a bunch of tables>
 WHERE <a bunch of conditions to find the intersection of the bunch of tables>

so that except in the case where the alias in the select list is a simple alias for a column name, the computation will be made twice.  So for example if you did something like:

  select BookID,
         Title,
         (
           SELECT group_concat(ltrim("First Name" || ' ' || "Last Name"),',')
             FROM AuthorBooks JOIN Authors USING (AuthorID)
            WHERE BookID == Books.BookID
         ) as "Author(s)",
         (
           SELECT group_concat("Date read",', ')
             FROM DatesRead
            WHERE BookID == Books.BookID
         ) as "Date(s)",
         (
           SELECT group_concat(Genre,', ')
             FROM BookGenres JOIN Genres USING (GenreID)
            WHERE BookID == Books.BookID
         ) AS "Genre(s)"
    FROM Books
   WHERE "Author(s)" IN NOT NULL
     AND "Date(s)" IS NOT NULL
     AND "Genre(s)" IS NOT NULL
ORDER BY BookID;

then you are executing the correlated subquery's TWICE each, once for the WHERE clause and once for the SELECT clause.  If you want to ensure that those values are not null, and do not want to execute the correlates twice, you need to do something like this which will execute the correlates only for the books that would not have null results is those three columns (without doing the duplicate group_concat).

But the inner join of the subqueries will still be more performant IF THE QUERY PLANNER USES AUTOMATIC INDEXES.

  select BookID,
         Title,
         (
           SELECT group_concat(ltrim("First Name" || ' ' || "Last Name"),',')
             FROM AuthorBooks JOIN Authors USING (AuthorID)
            WHERE BookID == Books.BookID
         ) as "Author(s)",
         (
           SELECT group_concat("Date read",', ')
             FROM DatesRead
            WHERE BookID == Books.BookID
         ) as "Date(s)",
         (
           SELECT group_concat(Genre,', ')
             FROM BookGenres JOIN Genres USING (GenreID)
            WHERE BookID == Books.BookID
         ) AS "Genre(s)"
    FROM Books
   WHERE BookID in (
                       SELECT BookID FROM AuthorBooks JOIN Authors USING (AuthorID)
                    INTERSECT
                       SELECT BookID FROM DatesRead
                    INTERSECT
                       SELECT BookID FROM BookGenres JOIN Genres USING (GenreID)
                   )
ORDER BY BookID;

>Is this a recent change in Sqlite or have I misunderstood something?
>The Sqlite documentation still does not seem to say that column aliases
>can be used in the WHERE clause at least.

Well, I think this was added somewhere along the way.  Remember they are ALIASES and the original text is substituted for them.

Of course, you could always just retrieve all the data and ignore the rows you do not want at the application level ...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner: Scanning subqueries vs using automatic covering index

niklas
In reply to this post by Simon Slavin-3
Thanks to all who have replied, very informative! :)

This is just a database for own personal use so it's not a big deal in any
way, mainly trying to get a better understanding of how Sqlite works here.

I'll note that the sql queries are not static inside my application but they
are generated dynamically from command line arguments.
Basically any column can be added to the SELECT, WHERE and ORDER BY clauses
at runtime, and some columns will be generated from sub-queries (via JOIN:s)
as shown in first post.

As the correlated sub-queries will be executed twice if used in the WHERE
clause it seems that using JOIN:s is preferable for my use cases.

Some further experimentation shows that using LEFT JOIN for the subqueries
instead of INNER JOIN will always make the query planner do the "right"
thing, i.e. use automatic indexes instead of table scans. Regardless of
ANALYZE information being present or not.

So that is maybe a better work-around than removing the ANALYZE tables. LEFT
JOIN and INNER JOIN will always return the same results in this case as all
books will (or should) have dates, genres and authors, and if they do not
then I probably want LEFT JOIN semantics anyway to better notice it. I
currently use INNER JOIN to give the query planner more freedom in selecting
query plans.

Still would be interesting to know why Sqlite went for plain table scans in
the initial case. Seems that using automatic indexes will always be faster
(N*logN vs N*N), so why not always use them when possible? Acccording to the
documentation Sqlite assumes N is a million without analyze information, and
in that case it opted to generate automatic indexes. In my case with ANALYZE
information present N will be around 3000, and then it opted for table
scans. The final query took over 24 minutes with all 3 sub-query columns
present when run to completion though, so obviously the wrong choice since
the loops ended up being nested three or more times.

(I understand that the query planner must take many different scenarios into
account and generate good plans for all of them, so this is most likely an
unfortunate edge case.)

Also noticed another case where Sqlite uses nested table scanning for JOIN:s
and this time it was not instead of automatic indexes, apparently it opted
for nested scans to avoid using a temp b-tree in the ORDER BY. (This is part
of co-routine for a window function using the AuthorID for partitions.)

Slow nested table scan (execution time measured in seconds):

|  |  |--SCAN TABLE Authors
|  |  |--SCAN TABLE DatesRead
|  |  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=? AND
AuthorID=?)
|  |  |--SEARCH SUBQUERY 1 ...
|  |  `--SEARCH SUBQUERY 2 ...

vs temp b-tree (execution time measured in milliseconds):

|  |  |--SCAN TABLE AuthorBooks
|  |  |--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
|  |  |--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
|  |  |--SEARCH SUBQUERY 1 ...
|  |  |--SEARCH SUBQUERY 2 ...
|  |  `--USE TEMP B-TREE FOR ORDER BY

This is part of a larger query but I have not included all tables used in it
so just show parts that differ here, everything else in the two queries is
identical.
Dropping the ANALYZE information makes the query planner select the faster
alternative here as well.
I can provide more information about the query in case anyone is interested,
just included these parts now to illustrate the "problematic" nested scans.

(Not really that problematic, this just came up in a test that iterated over
all supported columns for all main queries, in actual use of the application
I would
hardly run it, but still an interesting case I think.)




--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users