SELECT INTO TEMP TABLE takes long

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SELECT INTO TEMP TABLE takes long

tf@qvgps.com
Hi,

maybe, hopefully, I missed something, its still about this database:

http://files.qvgps.com/0-tom-public/Geonames_World_2017.zip

Copying just the ids from 12mio records ordered in a temp-table takes 60
seconds. There is a COLLATE NOCASE index on label.

Is this normal or can this also be done faster?

DROP TABLE IF EXISTS RowCursor;
CREATE TEMP TABLE RowCursor (Id int);
INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;
CREATE INDEX RowCursor_Id on RowCursor(Id);

Tom



--
/****************************************
**   Flemming Software Development CC
**   Thomas Flemming
**   PO Box 81244
**   Windhoek, Namibia
**   http://www.quovadis-gps.com
**   mail  [hidden email]
**   +264 (0)81 3329923
**   +49  (0)6182 8492599
***************************************/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT INTO TEMP TABLE takes long

Simon Slavin-3

On 31 May 2017, at 8:31pm, Thomas Flemming <[hidden email]> wrote:

> Copying just the ids from 12mio records ordered in a temp-table takes 60 seconds. There is a COLLATE NOCASE index on label.
>
> Is this normal or can this also be done faster?
>
> DROP TABLE IF EXISTS RowCursor;
> CREATE TEMP TABLE RowCursor (Id int);
> INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;
> CREATE INDEX RowCursor_Id on RowCursor(Id);

Please confirm that you are using the SQLite command-line shell to perform these timings.

It is faster and more space-efficient to combine two of the above commands into

        CREATE TEMP TABLE RowCursor (Id INTEGER PRIMARY KEY);

Can you tell me how long it takes to execute just

        SELECT Id from Pois ORDER BY Label COLLATE NOCASE;

?  Also, can you post the CREATE TABLE command for the table Pois ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT INTO TEMP TABLE takes long

R Smith
In reply to this post by tf@qvgps.com

On 2017/05/31 9:31 PM, Thomas Flemming wrote:

> Hi,
>
> maybe, hopefully, I missed something, its still about this database:
>
> http://files.qvgps.com/0-tom-public/Geonames_World_2017.zip
>
> Copying just the ids from 12mio records ordered in a temp-table takes
> 60 seconds. There is a COLLATE NOCASE index on label.
>
> Is this normal or can this also be done faster?
>
> DROP TABLE IF EXISTS RowCursor;
> CREATE TEMP TABLE RowCursor (Id int);
> INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;
> CREATE INDEX RowCursor_Id on RowCursor(Id);

At the end of the day it IS 12m records (or 11.42m to be more
precise)... so it should take a bit of time. That said, it could
probably be faster - My first attempt using your database included
changing the page-size from 1024 to 4096 using:
PRAGMA page_size = 4096; VACUUM;   --  (Bytes)  4KB
(Ensure the drive with your TEMP folder has more than 5GB free else the
above may fail).
Then setting Synchronous mode from FULL to Normal using:
PRAGMA synchronous = 1;   --  Normal
Then  jacked up the cache size from 2000 bytes to 8000 pages using:
PRAGMA cache_size = 8000;   --  (Pages)
Made sure Journal mode is DELETE,
Made sure Threads is set to 8.

Next I dropped the "CREATE INDEX..." bit at the end - it only consumed
circa 3 seconds, but it is not needed, your row-id is already indexed
and you only use the Id field for reference look-ups in the other table,
no need to index it here. The result is what seems to be about half your
time. I use a good processor but the DB itself sat on a platter drive,
so I doubt the gains are due to system differences, though some of it
might be. Herewith the result:

   -- SQLite version 3.17.0  [ Release: 2017-02-13 ]  on SQLitespeed
version 2.0.2.4.
   --
================================================================================================

DROP TABLE IF EXISTS RowCursor;

CREATE TEMP TABLE RowCursor (Id int);

INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;

   --    Item Stats:  Item No:           3             Query Size
(Chars):  74
   --                 VM Work Steps:     102790606      Rows
Modified:       11421177
   --                 Full Query Time:   0d 00h 00m and 28.471s
   --                 Query Result:      Success.
   --
------------------------------------------------------------------------------------------------

   --   Script Stats: Total Script Execution Time:     0d 00h 00m and
29.389s
   --                 Total Script Query Time:         0d 00h 00m and
28.522s
   --                 Total Database Rows Changed: 11421177
   --                 Total Virtual-Machine Steps: 205581259
   --                 Last executed Item Index:        4
   --                 Last Script Error:

   --
================================================================================================

On another tangent: No person can ever look at 12 million records in one
sitting. You should limit the span of results to something plausible,
like 100K rows. It's still a lot, but it is conceivable a person can
spend a few hours paging from one page to the next traversing an actual
100K rows... Improbable, but possible. Scanning 12 mil or even 1 mil
records is just implausible.

If you are going to view the entire table in Alphabetical order, an even
faster tactic would be to just permanently keep that Table with all the
rows in alphabetical order as a kind-of index table, filling it with a
trigger from the main table when changes happen. That way you can at any
time search for any Label, and when found, just look up the Pois ID in
the RowCursor table, and start paging from that row_id.  How your UI
will work and how you want it to work will of course dictate what is the
best solution.

Good luck!
Ryan


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT INTO TEMP TABLE takes long

tf@qvgps.com
Thanks guys for all the information.
Now I know, how to proceed.
Tom
:)


Am 31.05.2017 um 22:02 schrieb R Smith:

>
> On 2017/05/31 9:31 PM, Thomas Flemming wrote:
>> Hi,
>>
>> maybe, hopefully, I missed something, its still about this database:
>>
>> http://files.qvgps.com/0-tom-public/Geonames_World_2017.zip
>>
>> Copying just the ids from 12mio records ordered in a temp-table takes 60
>> seconds. There is a COLLATE NOCASE index on label.
>>
>> Is this normal or can this also be done faster?
>>
>> DROP TABLE IF EXISTS RowCursor;
>> CREATE TEMP TABLE RowCursor (Id int);
>> INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;
>> CREATE INDEX RowCursor_Id on RowCursor(Id);
>
> At the end of the day it IS 12m records (or 11.42m to be more precise)... so
> it should take a bit of time. That said, it could probably be faster - My
> first attempt using your database included changing the page-size from 1024 to
> 4096 using:
> PRAGMA page_size = 4096; VACUUM;   --  (Bytes)  4KB
> (Ensure the drive with your TEMP folder has more than 5GB free else the above
> may fail).
> Then setting Synchronous mode from FULL to Normal using:
> PRAGMA synchronous = 1;   --  Normal
> Then  jacked up the cache size from 2000 bytes to 8000 pages using:
> PRAGMA cache_size = 8000;   --  (Pages)
> Made sure Journal mode is DELETE,
> Made sure Threads is set to 8.
>
> Next I dropped the "CREATE INDEX..." bit at the end - it only consumed circa 3
> seconds, but it is not needed, your row-id is already indexed and you only use
> the Id field for reference look-ups in the other table, no need to index it
> here. The result is what seems to be about half your time. I use a good
> processor but the DB itself sat on a platter drive, so I doubt the gains are
> due to system differences, though some of it might be. Herewith the result:
>
>    -- SQLite version 3.17.0  [ Release: 2017-02-13 ]  on SQLitespeed version
> 2.0.2.4.
>    --
> ================================================================================================
>
>
> DROP TABLE IF EXISTS RowCursor;
>
> CREATE TEMP TABLE RowCursor (Id int);
>
> INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;
>
>    --    Item Stats:  Item No:           3             Query Size (Chars):  74
>    --                 VM Work Steps:     102790606      Rows Modified:      
> 11421177
>    --                 Full Query Time:   0d 00h 00m and 28.471s
>    --                 Query Result:      Success.
>    --
> ------------------------------------------------------------------------------------------------
>
>
>    --   Script Stats: Total Script Execution Time:     0d 00h 00m and 29.389s
>    --                 Total Script Query Time:         0d 00h 00m and 28.522s
>    --                 Total Database Rows Changed: 11421177
>    --                 Total Virtual-Machine Steps: 205581259
>    --                 Last executed Item Index:        4
>    --                 Last Script Error:
>
>    --
> ================================================================================================
>
>
> On another tangent: No person can ever look at 12 million records in one
> sitting. You should limit the span of results to something plausible, like
> 100K rows. It's still a lot, but it is conceivable a person can spend a few
> hours paging from one page to the next traversing an actual 100K rows...
> Improbable, but possible. Scanning 12 mil or even 1 mil records is just
> implausible.
>
> If you are going to view the entire table in Alphabetical order, an even
> faster tactic would be to just permanently keep that Table with all the rows
> in alphabetical order as a kind-of index table, filling it with a trigger from
> the main table when changes happen. That way you can at any time search for
> any Label, and when found, just look up the Pois ID in the RowCursor table,
> and start paging from that row_id.  How your UI will work and how you want it
> to work will of course dictate what is the best solution.
>
> Good luck!
> Ryan
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
/****************************************
**   Flemming Software Development CC
**   Thomas Flemming
**   PO Box 81244
**   Windhoek, Namibia
**   http://www.quovadis-gps.com
**   mail  [hidden email]
**   +264 (0)81 3329923
**   +49  (0)6182 8492599
***************************************/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT INTO TEMP TABLE takes long

David Raymond
Since the Id field in Pois is a "long" and not an "integer primary key" then it can get the order from the Pois_Label index, but then has to go into the Pois table itself to get the Id. If in Pois, Id was an "integer primary key" then the Pois_Label index would be a covering index and it wouldn't have that second step to get the Id from the main table.

CREATE TABLE Pois(
  Id LONG PRIMARY KEY,
  Label VARCHAR(50),
  Info TEXT,
  Lat FLOAT,
  Lon FLOAT,
  Z FLOAT,
  Flags INT,
  StyleId INT
);
CREATE INDEX Pois_Label ON Pois(Label COLLATE NOCASE);
CREATE INDEX Pois_StyleId ON Pois(StyleId);
CREATE INDEX Pois_Info ON Pois(Info COLLATE NOCASE);

sqlite> explain query plan select Id from Pois order by Label collate nocase;
selectid|order|from|detail
0|0|0|SCAN TABLE Pois USING INDEX Pois_Label

sqlite> explain select Id from Pois order by Label collate nocase;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     10    0                    00  Start at 10
1     Noop           1     3     0                    00
2     OpenRead       0     7     0     2              00  root=7 iDb=0; Pois
3     OpenRead       2     23    0     k(2,NOCASE,)   00  root=23 iDb=0; Pois_Label
4     Rewind         2     9     1     0              00
5       Seek           2     0     0                    00  Move 0 to 2.rowid
6       Column         0     0     1                    00  r[1]=Pois.Id
7       ResultRow      1     1     0                    00  output=r[1]
8     Next           2     5     0                    01
9     Halt           0     0     0                    00
10    Transaction    0     0     28    0              01  usesStmtJournal=0
11    Goto           0     1     0                    00


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Thomas Flemming
Sent: Wednesday, May 31, 2017 5:59 PM
To: [hidden email]
Subject: Re: [sqlite] SELECT INTO TEMP TABLE takes long

Thanks guys for all the information.
Now I know, how to proceed.
Tom
:)


Am 31.05.2017 um 22:02 schrieb R Smith:

>
> On 2017/05/31 9:31 PM, Thomas Flemming wrote:
>> Hi,
>>
>> maybe, hopefully, I missed something, its still about this database:
>>
>> http://files.qvgps.com/0-tom-public/Geonames_World_2017.zip
>>
>> Copying just the ids from 12mio records ordered in a temp-table takes 60
>> seconds. There is a COLLATE NOCASE index on label.
>>
>> Is this normal or can this also be done faster?
>>
>> DROP TABLE IF EXISTS RowCursor;
>> CREATE TEMP TABLE RowCursor (Id int);
>> INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;
>> CREATE INDEX RowCursor_Id on RowCursor(Id);
>
> At the end of the day it IS 12m records (or 11.42m to be more precise)... so
> it should take a bit of time. That said, it could probably be faster - My
> first attempt using your database included changing the page-size from 1024 to
> 4096 using:
> PRAGMA page_size = 4096; VACUUM;   --  (Bytes)  4KB
> (Ensure the drive with your TEMP folder has more than 5GB free else the above
> may fail).
> Then setting Synchronous mode from FULL to Normal using:
> PRAGMA synchronous = 1;   --  Normal
> Then  jacked up the cache size from 2000 bytes to 8000 pages using:
> PRAGMA cache_size = 8000;   --  (Pages)
> Made sure Journal mode is DELETE,
> Made sure Threads is set to 8.
>
> Next I dropped the "CREATE INDEX..." bit at the end - it only consumed circa 3
> seconds, but it is not needed, your row-id is already indexed and you only use
> the Id field for reference look-ups in the other table, no need to index it
> here. The result is what seems to be about half your time. I use a good
> processor but the DB itself sat on a platter drive, so I doubt the gains are
> due to system differences, though some of it might be. Herewith the result:
>
>    -- SQLite version 3.17.0  [ Release: 2017-02-13 ]  on SQLitespeed version
> 2.0.2.4.
>    --
> ================================================================================================
>
>
> DROP TABLE IF EXISTS RowCursor;
>
> CREATE TEMP TABLE RowCursor (Id int);
>
> INSERT INTO RowCursor SELECT Id from Pois ORDER BY Label COLLATE NOCASE;
>
>    --    Item Stats:  Item No:           3             Query Size (Chars):  74
>    --                 VM Work Steps:     102790606      Rows Modified:      
> 11421177
>    --                 Full Query Time:   0d 00h 00m and 28.471s
>    --                 Query Result:      Success.
>    --
> ------------------------------------------------------------------------------------------------
>
>
>    --   Script Stats: Total Script Execution Time:     0d 00h 00m and 29.389s
>    --                 Total Script Query Time:         0d 00h 00m and 28.522s
>    --                 Total Database Rows Changed: 11421177
>    --                 Total Virtual-Machine Steps: 205581259
>    --                 Last executed Item Index:        4
>    --                 Last Script Error:
>
>    --
> ================================================================================================
>
>
> On another tangent: No person can ever look at 12 million records in one
> sitting. You should limit the span of results to something plausible, like
> 100K rows. It's still a lot, but it is conceivable a person can spend a few
> hours paging from one page to the next traversing an actual 100K rows...
> Improbable, but possible. Scanning 12 mil or even 1 mil records is just
> implausible.
>
> If you are going to view the entire table in Alphabetical order, an even
> faster tactic would be to just permanently keep that Table with all the rows
> in alphabetical order as a kind-of index table, filling it with a trigger from
> the main table when changes happen. That way you can at any time search for
> any Label, and when found, just look up the Pois ID in the RowCursor table,
> and start paging from that row_id.  How your UI will work and how you want it
> to work will of course dictate what is the best solution.
>
> Good luck!
> Ryan
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
/****************************************
**   Flemming Software Development CC
**   Thomas Flemming
**   PO Box 81244
**   Windhoek, Namibia
**   http://www.quovadis-gps.com
**   mail  [hidden email]
**   +264 (0)81 3329923
**   +49  (0)6182 8492599
***************************************/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users