Optimization - don't understand.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimization - don't understand.

John Found
The following query:

    explain query plan
    select
      U.nick,
      U.id,
      U.av_time,
      T.Caption,
      P.id,
    --  P.ReadCount,
    --  P.Content,
    --  P.postTime,l
      T.Caption
    from Posts P
    left join Threads T on P.threadID = T.id
    left join ThreadTags TT on TT.threadID = T.id
    left join Users U on P.userID = U.id
    where TT.Tag = ?1;

...returns:

    0 0 0 SCAN TABLE Posts AS P USING COVERING INDEX idxPostsThreadUser
    0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
    0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX idxThreadTagsUnique (ThreadID=? AND Tag=?)
    0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

But uncommenting any of the commented fields, turns the result into:

    0 0 0 SCAN TABLE Posts AS P
    0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
    0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX idxThreadTagsUnique (ThreadID=? AND Tag=?)
    0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

... and significantly degrades the performance.

The index idxPostsThreadUser is defined following way:

    create index idxPostsThreadUser on Posts(threadid, userid);

IMHO, the change of the selected columns should not affect the query plan, but maybe I am wrong somehow.

What I am missing?

--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimization - don't understand.

Richard Hipp-3
On 2/5/18, John Found <[hidden email]> wrote:

> The following query:
>
>     explain query plan
>     select
>       U.nick,
>       U.id,
>       U.av_time,
>       T.Caption,
>       P.id,
>     --  P.ReadCount,
>     --  P.Content,
>     --  P.postTime,l
>       T.Caption
>     from Posts P
>     left join Threads T on P.threadID = T.id
>     left join ThreadTags TT on TT.threadID = T.id
>     left join Users U on P.userID = U.id
>     where TT.Tag = ?1;
>
> ...returns:
>
>     0 0 0 SCAN TABLE Posts AS P USING COVERING INDEX idxPostsThreadUser
>     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
>     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> idxThreadTagsUnique (ThreadID=? AND Tag=?)
>     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)
>
> But uncommenting any of the commented fields, turns the result into:
>
>     0 0 0 SCAN TABLE Posts AS P
>     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
>     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> idxThreadTagsUnique (ThreadID=? AND Tag=?)
>     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)
>
> ... and significantly degrades the performance.
>
> The index idxPostsThreadUser is defined following way:
>
>     create index idxPostsThreadUser on Posts(threadid, userid);
>
> IMHO, the change of the selected columns should not affect the query plan,
> but maybe I am wrong somehow.
>
> What I am missing?
>

SQLite prefers to scan the index rather than the original table,
because the index is usually smaller (since it contains less data) and
hence there is less I/O required to scan the whole thing.

But the index only provides access to the id, threadid, and userid
columns.  If content of other columns is needed, then the whole table
must be scanned instead.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Optimization - don't understand.

Hick Gunter
In reply to this post by John Found
The additional fields are not contained in the index idxPostsThreadUser, so SQLite is forced to read the original row instead of just the index.

Your query is searching the complete posts table, joining all the threads, tags and users together, and then discarding those without a matching tag.

Maybe you would be better off scanning the ThreadTags from the target tag, and then reconstructing the post from there.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von John Found
Gesendet: Montag, 05. Februar 2018 15:14
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] [sqlite] Optimization - don't understand.

The following query:

    explain query plan
    select
      U.nick,
      U.id,
      U.av_time,
      T.Caption,
      P.id,
    --  P.ReadCount,
    --  P.Content,
    --  P.postTime,l
      T.Caption
    from Posts P
    left join Threads T on P.threadID = T.id
    left join ThreadTags TT on TT.threadID = T.id
    left join Users U on P.userID = U.id
    where TT.Tag = ?1;

...returns:

    0 0 0 SCAN TABLE Posts AS P USING COVERING INDEX idxPostsThreadUser
    0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
    0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX idxThreadTagsUnique (ThreadID=? AND Tag=?)
    0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

But uncommenting any of the commented fields, turns the result into:

    0 0 0 SCAN TABLE Posts AS P
    0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
    0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX idxThreadTagsUnique (ThreadID=? AND Tag=?)
    0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

... and significantly degrades the performance.

The index idxPostsThreadUser is defined following way:

    create index idxPostsThreadUser on Posts(threadid, userid);

IMHO, the change of the selected columns should not affect the query plan, but maybe I am wrong somehow.

What I am missing?

--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimization - don't understand.

David Raymond
In reply to this post by John Found
Is P.id an integer primary key? If so then it can get it from any index. Since the only field it needs from P is id, it can use an index which isn't defined on id. This can make it quicker, especially if P has a lot of fields bloating its size. Once you include the other fields of P in the query then it has to go to the main table anyway to get those other fields and decides to do a full scan.


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of John Found
Sent: Monday, February 05, 2018 9:14 AM
To: SQLite mailing list
Subject: [sqlite] Optimization - don't understand.

The following query:

    explain query plan
    select
      U.nick,
      U.id,
      U.av_time,
      T.Caption,
      P.id,
    --  P.ReadCount,
    --  P.Content,
    --  P.postTime,l
      T.Caption
    from Posts P
    left join Threads T on P.threadID = T.id
    left join ThreadTags TT on TT.threadID = T.id
    left join Users U on P.userID = U.id
    where TT.Tag = ?1;

...returns:

    0 0 0 SCAN TABLE Posts AS P USING COVERING INDEX idxPostsThreadUser
    0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
    0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX idxThreadTagsUnique (ThreadID=? AND Tag=?)
    0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

But uncommenting any of the commented fields, turns the result into:

    0 0 0 SCAN TABLE Posts AS P
    0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
    0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX idxThreadTagsUnique (ThreadID=? AND Tag=?)
    0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

... and significantly degrades the performance.

The index idxPostsThreadUser is defined following way:

    create index idxPostsThreadUser on Posts(threadid, userid);

IMHO, the change of the selected columns should not affect the query plan, but maybe I am wrong somehow.

What I am missing?

--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimization - don't understand.

John Found
In reply to this post by Richard Hipp-3
It is clear now. But should I define an index that contains all fields used in the query?

Something like:

    create index idxPostsComplex on posts(threadid, userid, Content, postTime, ReadCount);

Actually I tried and the query uses this index without problems (and the performance seems to be good).

But what are the disadvantages of such approach? (except the bigger database size, of course)

On Mon, 5 Feb 2018 09:24:51 -0500
Richard Hipp <[hidden email]> wrote:

> On 2/5/18, John Found <[hidden email]> wrote:
> > The following query:
> >
> >     explain query plan
> >     select
> >       U.nick,
> >       U.id,
> >       U.av_time,
> >       T.Caption,
> >       P.id,
> >     --  P.ReadCount,
> >     --  P.Content,
> >     --  P.postTime,l
> >       T.Caption
> >     from Posts P
> >     left join Threads T on P.threadID = T.id
> >     left join ThreadTags TT on TT.threadID = T.id
> >     left join Users U on P.userID = U.id
> >     where TT.Tag = ?1;
> >
> > ...returns:
> >
> >     0 0 0 SCAN TABLE Posts AS P USING COVERING INDEX idxPostsThreadUser
> >     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
> >     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> > idxThreadTagsUnique (ThreadID=? AND Tag=?)
> >     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)
> >
> > But uncommenting any of the commented fields, turns the result into:
> >
> >     0 0 0 SCAN TABLE Posts AS P
> >     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
> >     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> > idxThreadTagsUnique (ThreadID=? AND Tag=?)
> >     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)
> >
> > ... and significantly degrades the performance.
> >
> > The index idxPostsThreadUser is defined following way:
> >
> >     create index idxPostsThreadUser on Posts(threadid, userid);
> >
> > IMHO, the change of the selected columns should not affect the query plan,
> > but maybe I am wrong somehow.
> >
> > What I am missing?
> >
>
> SQLite prefers to scan the index rather than the original table,
> because the index is usually smaller (since it contains less data) and
> hence there is less I/O required to scan the whole thing.
>
> But the index only provides access to the id, threadid, and userid
> columns.  If content of other columns is needed, then the whole table
> must be scanned instead.
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimization - don't understand.

Richard Hipp-3
On 2/5/18, John Found <[hidden email]> wrote:
>
> Actually I tried [adding a new index] and the query uses this index
> without problems (and the
> performance seems to be good).
>
> But what are the disadvantages of such approach? (except the bigger database
> size, of course)

(1) the database file is larger.

(2) When doing UPDATE on one of the columns added to the index, then
the index must be updated too, in addition to the table.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Optimization - don't understand.

Hick Gunter
In reply to this post by John Found
I think you are optimizing the performance of a conceptually inefficient query.

If you are looking for a recipe that contains apples, do you read the entire cook book, checking each recipe for apples? Maybe it is much more efficient to look up apples in the index of ingredients and retrieve only the recipes that actually contain them.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von John Found
Gesendet: Montag, 05. Februar 2018 15:55
An: [hidden email]
Betreff: [EXTERNAL] Re: [sqlite] Optimization - don't understand.

It is clear now. But should I define an index that contains all fields used in the query?

Something like:

    create index idxPostsComplex on posts(threadid, userid, Content, postTime, ReadCount);

Actually I tried and the query uses this index without problems (and the performance seems to be good).

But what are the disadvantages of such approach? (except the bigger database size, of course)

On Mon, 5 Feb 2018 09:24:51 -0500
Richard Hipp <[hidden email]> wrote:

> On 2/5/18, John Found <[hidden email]> wrote:
> > The following query:
> >
> >     explain query plan
> >     select
> >       U.nick,
> >       U.id,
> >       U.av_time,
> >       T.Caption,
> >       P.id,
> >     --  P.ReadCount,
> >     --  P.Content,
> >     --  P.postTime,l
> >       T.Caption
> >     from Posts P
> >     left join Threads T on P.threadID = T.id
> >     left join ThreadTags TT on TT.threadID = T.id
> >     left join Users U on P.userID = U.id
> >     where TT.Tag = ?1;
> >
> > ...returns:
> >
> >     0 0 0 SCAN TABLE Posts AS P USING COVERING INDEX idxPostsThreadUser
> >     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
> >     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> > idxThreadTagsUnique (ThreadID=? AND Tag=?)
> >     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY
> > (rowid=?)
> >
> > But uncommenting any of the commented fields, turns the result into:
> >
> >     0 0 0 SCAN TABLE Posts AS P
> >     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
> >     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> > idxThreadTagsUnique (ThreadID=? AND Tag=?)
> >     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY
> > (rowid=?)
> >
> > ... and significantly degrades the performance.
> >
> > The index idxPostsThreadUser is defined following way:
> >
> >     create index idxPostsThreadUser on Posts(threadid, userid);
> >
> > IMHO, the change of the selected columns should not affect the query
> > plan, but maybe I am wrong somehow.
> >
> > What I am missing?
> >
>
> SQLite prefers to scan the index rather than the original table,
> because the index is usually smaller (since it contains less data) and
> hence there is less I/O required to scan the whole thing.
>
> But the index only provides access to the id, threadid, and userid
> columns.  If content of other columns is needed, then the whole table
> must be scanned instead.
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimization - don't understand.

Simon Slavin-3
In reply to this post by John Found
On 5 Feb 2018, at 2:54pm, John Found <[hidden email]> wrote:

> It is clear now. But should I define an index that contains all fields used in the query?
>
> Something like:
>
>    create index idxPostsComplex on posts(threadid, userid, Content, postTime, ReadCount);
>
> Actually I tried and the query uses this index without problems (and the performance seems to be good).

Since you have the ability to do timings, why not try it ?  Do things get much faster after creating that extra index ?  Is the SELECT one which is used a lot or is it used just once in your program, at a time where execution time is not a problem ?

> But what are the disadvantages of such approach? (except the bigger database size, of course)

* Increase in database file size
* More time taken when inserting rows into that table (one extra index to update)
* More time taken when changing data in any of those columns (one extra index to update)

This is the usual payoff in computers: if it takes less time to find the data you want it probably takes longer to assemble the data to start with.  You have to figure out which of the two operations is more time-critical.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Optimization - don't understand.

John Found
In reply to this post by Hick Gunter
On Mon, 5 Feb 2018 15:08:33 +0000
Hick Gunter <[hidden email]> wrote:

> I think you are optimizing the performance of a conceptually inefficient query.
>
> If you are looking for a recipe that contains apples, do you read the entire cook book, checking each recipe for apples? Maybe it is much more efficient to look up apples in the index of ingredients and retrieve only the recipes that actually contain them.

You are definitely right, but the things are a little bit more complex.

The query I asked for is simplified in order to make the question more clear. It is part of a complex search, looking simultaneously in several fields: fts5 search (removed for simplicity), T.Caption, TT.Tag and U.nick fields.

I am trying to estimate how exactly to handle all these possible combinations and whether it is possible to be done with one fixed query or need specially synthesized query for every particular case.


>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users [mailto:[hidden email]] Im Auftrag von John Found
> Gesendet: Montag, 05. Februar 2018 15:55
> An: [hidden email]
> Betreff: [EXTERNAL] Re: [sqlite] Optimization - don't understand.
>
> It is clear now. But should I define an index that contains all fields used in the query?
>
> Something like:
>
>     create index idxPostsComplex on posts(threadid, userid, Content, postTime, ReadCount);
>
> Actually I tried and the query uses this index without problems (and the performance seems to be good).
>
> But what are the disadvantages of such approach? (except the bigger database size, of course)
>
> On Mon, 5 Feb 2018 09:24:51 -0500
> Richard Hipp <[hidden email]> wrote:
>
> > On 2/5/18, John Found <[hidden email]> wrote:
> > > The following query:
> > >
> > >     explain query plan
> > >     select
> > >       U.nick,
> > >       U.id,
> > >       U.av_time,
> > >       T.Caption,
> > >       P.id,
> > >     --  P.ReadCount,
> > >     --  P.Content,
> > >     --  P.postTime,l
> > >       T.Caption
> > >     from Posts P
> > >     left join Threads T on P.threadID = T.id
> > >     left join ThreadTags TT on TT.threadID = T.id
> > >     left join Users U on P.userID = U.id
> > >     where TT.Tag = ?1;
> > >
> > > ...returns:
> > >
> > >     0 0 0 SCAN TABLE Posts AS P USING COVERING INDEX idxPostsThreadUser
> > >     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
> > >     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> > > idxThreadTagsUnique (ThreadID=? AND Tag=?)
> > >     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY
> > > (rowid=?)
> > >
> > > But uncommenting any of the commented fields, turns the result into:
> > >
> > >     0 0 0 SCAN TABLE Posts AS P
> > >     0 1 1 SEARCH TABLE Threads AS T USING INTEGER PRIMARY KEY (rowid=?)
> > >     0 2 2 SEARCH TABLE ThreadTags AS TT USING COVERING INDEX
> > > idxThreadTagsUnique (ThreadID=? AND Tag=?)
> > >     0 3 3 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY
> > > (rowid=?)
> > >
> > > ... and significantly degrades the performance.
> > >
> > > The index idxPostsThreadUser is defined following way:
> > >
> > >     create index idxPostsThreadUser on Posts(threadid, userid);
> > >
> > > IMHO, the change of the selected columns should not affect the query
> > > plan, but maybe I am wrong somehow.
> > >
> > > What I am missing?
> > >
> >
> > SQLite prefers to scan the index rather than the original table,
> > because the index is usually smaller (since it contains less data) and
> > hence there is less I/O required to scan the whole thing.
> >
> > But the index only provides access to the id, threadid, and userid
> > columns.  If content of other columns is needed, then the whole table
> > must be scanned instead.
> > --
> > D. Richard Hipp
> > [hidden email]
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> --
> http://fresh.flatassembler.net
> http://asm32.info
> John Found <[hidden email]>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>  Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users