Thank you for explaining this.
I guess you're right about query planner deciding to avoid index usage based on stats. 22 июля 2011, 18:30 от Richard Hipp <[hidden email]>: > On Fri, Jul 22, 2011 at 9:48 AM, Григорий Григоренко <[hidden email]>wrote: > > > > > > > Please post the results of: > > > > > > SELECT * FROM sqlite_stat1; > > > > > > > > tbl = log > > idx = idxlog_kind_computer_process_who_id_msg_created_at > > stat = 2815667 563134 563134 469278 74097 1 1 1 > > > > The first number on "stat" is the number of rows in the table. The 2nd > number is the average number of rows that have the same value for the first > column of the index. The 3rd number is the average number of rows that have > the same value for the first 2 columns of the index. And so forth. > > There are a huge number of rows that have the same value for the first 4 > terms of this index, which shows us that this is a really lousy index. > Ideally, you want the second number in the "stat" column to be something > small, like 10. > > If you do not run ANALYZE, SQLite has no way of knowing that the index is > mostly useless. SQLite assumes that the index is a good one, and that the > 2nd integer in "stat" is 10. And it therefore tries to use the index. But > since the index is so bad, the resulting performance is slow. > > After running ANALYZE, SQLite realizes that the index is lousy and avoids > using it. Hence, performance is much better. > > > > > > tbl = log > > idx = idxlog_kind_computer_process_id_who_msg_created_at > > stat = 2815667 563134 563134 469278 1 1 1 1 > > > > tbl = log > > idx = idxlog_kind_computer_id_process_who_msg_created_at > > stat = 2815667 563134 563134 1 1 1 1 1 > > > > tbl = log > > idx = idxlog_kind_id_computer_process_who_msg_created_at > > stat = 2815667 563134 1 1 1 1 1 1 > > > > tbl = log > > idx = idxlog_kind_computer_process_who_msg_created_at > > stat = 2815667 563134 563134 469278 74097 2 2 > > > > tbl = log > > idx = idxlog_created_at > > stat = 2815667 106 > > > > > > -- > D. Richard Hipp > [hidden email] > > sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Tito Ciuro
>
> Perhaps my post dated Aug. 19, 2009 will help a little bit: > > http://www.cocoabuilder.com/archive/cocoa/242954-core-data-dog-slow-when-using-first-time-after-boot.html > > -- Tito > Thanks for sharing. "warming file" is a way to cache whole database as I understand it. After everything is cached scattered reading from database runs faster. Unfortunately, in my case base size is ~ 3.5 Gb; it's too big for this strategy. Even if I read at 25 Mb/s rate it will took 3500 / 25 = ~140 seconds just to read whole db file. And what's more important I've only 2 Gb of RAM. Anyway, thanks for sharing. I guess these cases are similar. To me problem looks like this: SQLITE needs to read (cache) from db a lot (too much?) while first-time query execution even if a query uses nicely matched index and returns nothing. And SQLITE is doing lot's of scattered readings during query execution; not trying to somehow batch read or similar. That's why file caching helps. If it's true not sure there's a simple and nice solution. I'll try some ideas (including normalization) and report results in this topic next week. _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
Hi,
It has worked fairly well with small databases, but I see the problem with medium to large files. Have you tried to run ANALYZE on your database? I'm curious to know how long it takes. -- Tito On Jul 24, 2011, at 8:26 AM, Григорий Григоренко wrote: >> >> Perhaps my post dated Aug. 19, 2009 will help a little bit: >> >> http://www.cocoabuilder.com/archive/cocoa/242954-core-data-dog-slow-when-using-first-time-after-boot.html >> >> -- Tito >> > > Thanks for sharing. "warming file" is a way to cache whole database as I understand it. > > After everything is cached scattered reading from database runs faster. > > Unfortunately, in my case base size is ~ 3.5 Gb; it's too big for this strategy. > > Even if I read at 25 Mb/s rate it will took 3500 / 25 = ~140 seconds just to read whole db file. > > And what's more important I've only 2 Gb of RAM. > > Anyway, thanks for sharing. I guess these cases are similar. > > > To me problem looks like this: > > SQLITE needs to read (cache) from db a lot (too much?) while first-time query execution even if a query uses nicely matched index and returns nothing. > > And SQLITE is doing lot's of scattered readings during query execution; not trying to somehow batch read or similar. That's why file caching helps. > > If it's true not sure there's a simple and nice solution. > > I'll try some ideas (including normalization) and report results in this topic next week. > _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
ANALYZE lasted for ~ 15 minutes.
24 июля 2011, 17:21 от Tito Ciuro <[hidden email]>: > Hi, > > It has worked fairly well with small databases, but I see the problem with medium to large files. Have you tried to run ANALYZE on your database? I'm curious to know how long it takes. > > -- Tito > _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Григорий Григоренко
Okay, here are some steps I've done.
1) normalize db; I've created single table (item). "computer","process" and "who" fields in log table became rowid integers pointing to this table. "kind" is still a short string. 2) give up covering indexes; For those not aware (http://www.sqlite.org/queryplanner.html , 1.7 Covering Indices) it's an index that has additional columns at it's end ( being selected by query). Covering index eliminate the need to read data from db records. But it increases size of index and size of database. 3) use INDEXED BY to suggest index to query planner. Since indexes are created exactly for specific query. Software was working during weekend. Yesterday's database has 6 mln records and it's only 1.1 Gb in size. Comparing to 4 mln records and 3.5 Gb size before. Now, the long running query took 27 seconds and it has read 50 Mb from database (compare to 2 minutes and 307 Mb before). _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
An 8X speedup is a pretty good achievement...congrats...
#1 I take it your query is CPU bound the first time? #2 Can you show us the query planner please? #3 Can you show us the query planner minus the "INDEXED BY"? #4 Can you show us sqlite_stat1? #5 Can you show us your tables now? #6 What happens if you do "pragma cache_size=15000"? I'd make cache_size = (readbytes/8192)*1.1 at least. #7 Care to post your database again? It still seems to me this should run faster the first time unless you have a really slow disk system or sqlite is doing something silly (which I doubt). Perhaps the cache being to small is hurting things. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate ________________________________ From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]] Sent: Monday, July 25, 2011 3:30 AM To: sqlite-users Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow Okay, here are some steps I've done. 1) normalize db; I've created single table (item). "computer","process" and "who" fields in log table became rowid integers pointing to this table. "kind" is still a short string. 2) give up covering indexes; For those not aware (<thismessage:/>http://www.sqlite.org/queryplanner.html , 1.7 Covering Indices) it's an index that has additional columns at it's end ( being selected by query). Covering index eliminate the need to read data from db records. But it increases size of index and size of database. 3) use INDEXED BY to suggest index to query planner. Since indexes are created exactly for specific query. Software was working during weekend. Yesterday's database has 6 mln records and it's only 1.1 Gb in size. Comparing to 4 mln records and 3.5 Gb size before. Now, the long running query took 27 seconds and it has read 50 Mb from database (compare to 2 minutes and 307 Mb before). _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
>
> #1 I take it your query is CPU bound the first time? Hmm, not sure. CPU was still under 1 % usage. > > #2 Can you show us the query planner please? > #3 Can you show us the query planner minus the "INDEXED BY"? Sure. Executing: SELECT * FROM log INDEXED BY idxlog_kind_computer WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100; Query plan: 0|0|0|SEARCH TABLE log USING INDEX idxlog_kind_computer (kind=? AND computer=?) (~3 rows) Executing: SELECT * FROM log WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100; Query plan: 0|0|0|SEARCH TABLE log USING INDEX idxlog_kind_computer_process_who (kind=? AND computer=?) (~3 rows) Query planner has selected idxlog_kind_computer_process_who index. > > > #4 Can you show us sqlite_stat1? Actually, there isn't any. I've not ran ANALYZE on this database. Since I'm using INDEXED BY there's no need in query planner decisions. > > #5 Can you show us your tables now? > You mean db schema? // table with computer, process and who values CREATE TABLE item(name text,value text); CREATE UNIQUE INDEX idxitem_name_value ON item(name,value); // log-table CREATE TABLE log(id integer primary key autoincrement,msg text,created_at int,kind,computer,process,who); CREATE INDEX idxlog_created_at ON log(created_at); CREATE INDEX idxlog_kind ON log(kind); CREATE INDEX idxlog_kind_computer ON log(kind,computer); CREATE INDEX idxlog_kind_computer_process ON log(kind,computer,process); CREATE INDEX idxlog_kind_computer_process_who ON log(kind,computer,process,who); Since I've normalized data selecting (in real-life application) looks like: SELECT ... kind,msg,c.value as computer,p.value as process,w.value as who,created_at,id FROM log ... LEFT JOIN item c ON c.rowid=log.computer LEFT JOIN item p ON p.rowid=log.process LEFT JOIN item w ON w.rowid=log.who ... > #6 What happens if you do "pragma cache_size=15000"? I'd make cache_size = (readbytes/8192)*1.1 at least. (readbytes/8192)*1.1 gives us ~ 6700 pages. Tried "pragma cache_size=15000;" and nothing changed. Sqlite3 still has read ~ 50 Mb and spent 27 seconds on first query run. > > #7 Care to post your database again? > http://dl.dropbox.com/u/2168777/db2.rar > > > It still seems to me this should run faster the first time unless you have a really slow disk system or sqlite is doing something silly (which I doubt). > > Perhaps the cache being to small is hurting things. I'm not sure cache settings are important. You see, I've tried running query with different cache_size. Even with cache_size=100. Tried cache_size from 100 to 25000 and monitored memory usage of console process to make sure it was using only the amount of memory given by cache. Timing is _always_ ~ 1 second on subsequent (not first) runs. You can try it yourself with attached data. > > > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > ________________________________ > From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]] > Sent: Monday, July 25, 2011 3:30 AM > To: sqlite-users > Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow > > Okay, here are some steps I've done. > > 1) normalize db; > > I've created single table (item). "computer","process" and "who" fields in log table became rowid integers pointing to this table. > "kind" is still a short string. > > 2) give up covering indexes; > > For those not aware (<thismessage:/>http://www.sqlite.org/queryplanner.html , 1.7 Covering Indices) it's an index that has additional columns at it's end ( being selected by query). > Covering index eliminate the need to read data from db records. But it increases size of index and size of database. > > 3) use INDEXED BY to suggest index to query planner. > > Since indexes are created exactly for specific query. > > > Software was working during weekend. > > Yesterday's database has 6 mln records and it's only 1.1 Gb in size. > > Comparing to 4 mln records and 3.5 Gb size before. > > Now, the long running query took 27 seconds and it has read 50 Mb from database (compare to 2 minutes and 307 Mb before). > > _______________________________________________ > sqlite-users mailing list > [hidden email] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Григорий Григоренко
I think I narrowed the problem a bit. Guys, hope I'm not bothering you too much :)
I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index. Database file increased by 105 Mb (and sqlite3 process counter shows that there were ~105 Mb written to disk). This means that index on log(kind, computer) takes 105 Mb of database file (and whole size of database is 1259 Mb). Now, I'm running query which is using this index (and is not returning any data) and monitor that sqlite3 process reads ~50 Mb. So there are two major problems here. 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query). If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all. 2) SQLITE is reading abnormally slowly during this first-time running query (waiting for something a lot?). During index creation I monitored sqlite3 process and it was consuming CPU at ~20% rate and it's doing I/O at ~10 Mb per second rate. That's what I call "normal load"! _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
On 25-07-2011 15:45, Григорий Григоренко wrote:
> I think I narrowed the problem a bit. Guys, hope I'm not bothering you too much :) > > > I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index. I thought that, if you have an index on a,b,c,d than you should not have an index on a,b,c too because if you use those 3 field in the where-clause, use can be made of the 4-field index Therefore i think you can do: DROP INDEX idxlog_kind; DROP INDEX idxlog_kind_computer; DROP INDEX idxlog_kind_computer_process; It drops your db-size, and a 'normal' select will make use of the idxlog_kind_computer_process_who index. C:\TEMP>sqlite3 2011-07-24.dblite 0<sql.txt 0|0|TABLE log WITH INDEX idxlog_kind_computer_process_who Seconds elapsed: 1 C:\TEMP>run.cmd C:\TEMP>sqlite3 2011-07-24.dblite 0<sql.txt 0|0|TABLE log WITH INDEX idxlog_kind_computer_process_who Seconds elapsed: 2 -- Luuk _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Григорий Григоренко
You need to normalize your "kind" value.
.pragma cache_size=15000; drop index idxlog_kind_computer; create table kind(id integer,kind text); insert into kind values(1,'debug'); insert into kind values(2,'error'); insert into kind values(3,'info'); insert into kind values(4,'timing'); insert into kind values(5,'warn'); update log set kind=1 where kind='debug'; update log set kind=2 where kind='error'; update log set kind=3 where kind='info'; update log set kind=4 where kind='timing'; update log set kind=5 where kind='warn'; create index idxlog_kind_computer ON log(kind,computer); Then see how long your first query takes. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate ________________________________ From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]] Sent: Monday, July 25, 2011 8:45 AM To: [hidden email] Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow I think I narrowed the problem a bit. Guys, hope I'm not bothering you too much :) I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index. Database file increased by 105 Mb (and sqlite3 process counter shows that there were ~105 Mb written to disk). This means that index on log(kind, computer) takes 105 Mb of database file (and whole size of database is 1259 Mb). Now, I'm running query which is using this index (and is not returning any data) and monitor that sqlite3 process reads ~50 Mb. So there are two major problems here. 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query). If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all. 2) SQLITE is reading abnormally slowly during this first-time running query (waiting for something a lot?). During index creation I monitored sqlite3 process and it was consuming CPU at ~20% rate and it's doing I/O at ~10 Mb per second rate. That's what I call "normal load"! _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Григорий Григоренко
On Mon, Jul 25, 2011 at 5:45 PM, Григорий Григоренко <[hidden email]> wrote:
> > 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query). > > If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all. > Please, post your query. To understand whether sqlite reads too much or not it's better to know what exactly you want to select. > 2) SQLITE is reading abnormally slowly during this first-time running query (waiting for something a lot?). > Is this with the recreated index or still the one that was created during the lifetime of your program? Max _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
madIS is an extensible relational database system built upon the SQLite
database and with extensions written in Python (via APSW SQLite wrapper). Its is developed at: http://madis.googlecode.com Due to madIS’ SQLite core, the database format of madIS is exactly the same as SQLite’s one. This means that all SQLite database files are directly usable with madIS. In usage, madIS, feels like a lightweight personal Hive+Hadoop programming environment, without the distributed processing capabilities of Hadoop. Nevertheless due to its low overhead while running on a single computer (compared to Hadoop), madIS can easily handle tens of millions of rows on a single desktop/laptop computer. In version 1.3 of madIS: - A great deal of testing has been done on Linux, Windows and Mac OSX. - madIS now only depends on APSW on all these systems, so it is easier to be installed - XMLPARSE was added. XMLPARSE processes its input in a streaming fashion, and has been tested with very large (~20+ GB) XML source files without problems. - JPACK functions were added. Jpacks are now the preferable way to store a set of values into a single tuple. For easy viewing and exporting of the jpacks, their format was based on the JSON format. - Heavy testing under Windows and Mac OSX. CLIPBOARD and CLIPOUT virtual tables work under all OSes. - CLIPOUT and CLIPBOARD, have been tested with Excel, Libre/Open Office Calc, and iWork Numbers. - Functions that return tables, can easily be coded now, by using Python's generators (yield) - A lot of completions (via TAB) have been added to mterm. Mterm's completion engine can automatically complete, tables, column names, table index names and database filenames in attach database. In detail: MTERM changes: While using madIS's terminal (mterm), mterm completes (via TAB) column names, tables names, etc. of the opened and attached databases. Also by default mterm colours column separators and if more than 4 columns are returned, mterm "tags" the columns with numbers: mterm> select * from deficit; [1|1 | People's Republic of China [3|272.500 |2010 [1|2 | Japan [3|166.500 |2010 [1|3 | Germany [3|162.300 |2010 --- Column names --- [1|Rank [2|Country [3|CAB [4|Year Note: In mterm the column number tags are coloured red in above example XMLPARSE: If i wished to retrieve the date and author of madIS project's Source Changes ATOM feed: mterm> select * from (XMLPARSE '<entry><updated>t</updated><author><name>t</name></author></entry>' select * from file('http://code.google.com/feeds/p/madis/hgchanges/basic') ) limit 3; 2011-07-25T14:07:07Z|est...@servum 2011-07-25T14:04:09Z|est...@servum 2011-07-22T14:08:11Z|est...@servum --- Column names --- [1|updated [2|author_name Query executed in 0 min. 0 sec 543 msec mterm> In above query, XMLPARSE is used in an "inverted form" which is easier to write, when chaining virtual tables. JPACKS: Frequently, the need to store multiple values into a tuple arises while processing data. Previously in madIS a lot of formats were used to store all these multiple values (space separated, comma separated, tab separated). Now JPACKs are the recommended way to store multiple values. JPACKs are based on the JSON format, with the exception that a JPACK of a single value is itself. Some examples are presented below: mterm> select jpack('a'); a mterm> select jpack('a', 'b'); ["a","b"] mterm> select jsplit(jpack('a','b','c')); a|b|c mterm> select jsplitv(jpack('a','b','c')); a b c mterm> select jgroup(c1) from (select 'a' as c1 union select 'b'); ["a","b"] CLIPBOARD and CLIPOUT: If a selection of data has been made in the web browser or a spreadsheet program then by executing the following in mterm we could access the clipboard data as a table: mterm> select * from CLIPBOARD(); Rank↓ |Country↓ |CAB (billion US dollars)↓ |Year↓ 1 | People's Republic of China |272.500 |2010 2 | Japan |166.500 |2010 3 | Germany |162.300 |2010 --- Column names --- [1|C1 [2|C2 [3|C3 [4|C4 Query executed in 0 min. 0 sec 204 msec * For above data i selected the top 3 rows while browsing http://en.wikipedia.org/wiki/List_of_sovereign_states_by_current_account_balance . Similarly if i wished to process and send the data to a spreadsheet program then i could execute the following in mterm: mterm> CLIPOUT select * from deficit; 1 --- Column names --- [1|return_value Query executed in 0 min. 0 sec 111 msec and paste in a spreadsheet program. Both CLIPBOARD and CLIPOUT are virtual tables. CLIPOUT is used in an "inverted form" which is easier to write, when chaining queries. -- Lefteris _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
Why JPack? May be Tcl lists will be more useful? The tcl dictionary
(also known as associative array) can be stored as list too. SELECT TCLCMD('dict', 'get', 'key 1 mykey 2', 'mykey'); 2 SELECT TCLCMD('lindex', 'key 1 mykey 2', 0); key SELECT TCLCMD('join', 'key 1 mykey 2', '-'); key-1-mykey-2 SELECT TCLCMD('lreplace', 'key 1 mykey 2', -1, -1, 'test'); test key 1 mykey 2 SELECT TCLCMD('lreplace', 'key 1 mykey 2', 'end', 'end', 'test'); key 1 mykey test SELECT TCLCMD('linsert', 'a b', 0, 'c'); c a b SELECT TCLCMD('linsert', 'a b', 'end', 'c'); a b c SELECT TCLCMD('lsort', 'a c b'); a b c SELECT TCLCMD('lsort', '-decreasing', 'a c b'); c b a SELECT TCLCMD('lreverse', 'a c b'); b c a SELECT TCLCMD('lsearch', 'a c b', 'b'); 2 SELECT TCLCMD('lsearch', 'a c b', 'd'); -1 See details here: http://sqlite.mobigroup.ru/wiki?name=ext_tcl -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Luuk
>
> I thought that, if you have an index on a,b,c,d > than you should not have an index on a,b,c too > because if you use those 3 field in the where-clause, use can be made of > the 4-field index > I'm not sure. Let me explain. I need query to be ORDER BY id DESC. I've dropped this ORDER BY to simplify my case. But in real-life app I need it. So, index on (kind, computer) has these index records: [ KIND ] [ COMPUTER ] [ ID ] ... They are ordered by kind, computer and then by id. So ORDER BY id DESC comes free. Query planner just need to retrieve records from subset of index records starting at last one backwards. Index on (kind,computer, process,who) has these index records: [ KIND ] [ COMPUTER ] [ PROCESS ] [ WHO ] [ ID ] This time having found subset of index records query planner cannot start retrieving them from the last to first. It has to sort them by ID at first. Isn't it? Maybe I'm not getting it right? _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Григорий Григоренко
> >
> > 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query). > > > > If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all. > > > > Please, post your query. To understand whether sqlite reads too much > or not it's better to know what exactly you want to select. > This is script I ran: CREATE TABLE IF NOT EXISTS T(t); DELETE FROM T; INSERT INTO T VALUES( strftime('%s', 'now' ) ); EXPLAIN QUERY PLAN SELECT * FROM log INDEXED BY idxlog_kind_computer WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100; SELECT * FROM log INDEXED BY idxlog_kind_computer WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100; SELECT "Seconds elapsed: " || (strftime('%s', 'now' ) - t) FROM T ; > > > 2) SQLITE is reading abnormally slowly during this first-time running query (waiting for something a lot?). > > > > Is this with the recreated index or still the one that was created > during the lifetime of your program? > It doesn't matter. Tried it with old and with recreated index, same behaviour. _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Григорий Григоренко
On 26 Jul 2011, at 9:30am, Григорий Григоренко wrote: > I need query to be ORDER BY id DESC. I've dropped this ORDER BY to simplify my case. But in real-life app I need it. > > So, index on (kind, computer) has these index records: > > [ KIND ] [ COMPUTER ] [ ID ] I don't know that SQLite does an inherent addition of the 'id' column to all INDEXes. Some index implementations do this because their algorithm requires that all positions in an index are unique, but I don't think SQLite works like that. I think that if you want 'id' to be part of an index you have to say that in the definition. Nevertheless this still doesn't explain why your first run of a query is so much slower than subsequent runs. SQLite's cache size is set here: http://www.sqlite.org/pragma.html#pragma_cache_size If the cache size described there doesn't explain the behaviour you're seeing, the problem isn't with SQLite, it's in your application or the OS or your hardware. Simon. _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Black, Michael (IS)
Did that. Timings has decreased. As I understand it it's about decreasing index size (that includes kind column).
To me the problem is still there. If my database will have 10 mln log records first running query will stuck again :( I don't understand SQLITE strategy. Let me explain. Index is an array of index records. They fill pages in database. Searching with B-Tree index is similar to binary search in ordered array, isn't it? You pick record in a middle of array subset and compare to conditional value. This step let you drop half of index subset from search. Let's say size of index is 100 Mb and it contains 4 mln index records. This is 100 Mb / 8 Kb (size of page) ~ 12000 pages. While doing binary search for 4 mln records we do 22 compares at most (2^22 ~ 4 mln). Assume worst case - all of these comparings use different pages. So, we need to read 22 pages = 180 Kb. Surely there's additional data to be read for index. Like some intermediate nodes in B-Tree. Let's triple the number of pages, 66 pages = 540 Kb. But SQLITE reads in this case ~ 50 Mb!! This leads us to conclusion: index in SQLITE database if scattered and cannot be "jumped directly" to N-th element. SQLITE has to read it somehow consecutively. And so SQLITE has to read half of index (!) to find matching index record. Am I getting it right? 25 июля 2011, 19:35 от "Black, Michael (IS)" <[hidden email]>: > You need to normalize your "kind" value. > > > > .pragma cache_size=15000; > > drop index idxlog_kind_computer; > > create table kind(id integer,kind text); > insert into kind values(1,'debug'); > insert into kind values(2,'error'); > insert into kind values(3,'info'); > insert into kind values(4,'timing'); > insert into kind values(5,'warn'); > update log set kind=1 where kind='debug'; > update log set kind=2 where kind='error'; > update log set kind=3 where kind='info'; > update log set kind=4 where kind='timing'; > update log set kind=5 where kind='warn'; > create index idxlog_kind_computer ON log(kind,computer); > > > > Then see how long your first query takes. > > > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > ________________________________ > From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]] > Sent: Monday, July 25, 2011 8:45 AM > To: [hidden email] > Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow > > I think I narrowed the problem a bit. Guys, hope I'm not bothering you too much :) > > > I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index. > > Database file increased by 105 Mb (and sqlite3 process counter shows that there were ~105 Mb written to disk). > > > This means that index on log(kind, computer) takes 105 Mb of database file (and whole size of database is 1259 Mb). > > > Now, I'm running query which is using this index (and is not returning any data) and monitor that sqlite3 process reads ~50 Mb. > > > So there are two major problems here. > > 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query). > > If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all. > > 2) SQLITE is reading abnormally slowly during this first-time running query (waiting for something a lot?). > > During index creation I monitored sqlite3 process and it was consuming CPU at ~20% rate and it's doing I/O at ~10 Mb per second rate. > That's what I call "normal load"! > > > > > > > > > > > > > > > _______________________________________________ > sqlite-users mailing list > [hidden email] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Simon Slavin-3
>
> > I need query to be ORDER BY id DESC. I've dropped this ORDER BY to simplify my case. But in real-life app I need it. > > > > So, index on (kind, computer) has these index records: > > > > [ KIND ] [ COMPUTER ] [ ID ] > > I don't know that SQLite does an inherent addition of the 'id' column to all INDEXes. Some index implementations do this because their algorithm requires that all positions in an index are unique, but I don't think SQLite works like that. I think that if you want 'id' to be part of an index you have to say that in the definition. SQLITE always stores rowid in every index record. This rowid let SQLITE find approriate database record. You can read more here: http://www.sqlite.org/queryplanner.html > > Nevertheless this still doesn't explain why your first run of a query is so much slower than subsequent runs. SQLite's cache size is set here: > > http://www.sqlite.org/pragma.html#pragma_cache_size > > If the cache size described there doesn't explain the behaviour you're seeing, the problem isn't with SQLite, it's in your application or the OS or your hardware. The problem is filling the cache. SQLITE wants too much data to execute simple query. Cache size of SQLITE doesn't matter because OS does it's own caching. After query has been run for a first time OS caches disk pages. And subsequent queries immediately gets data from RAM not from disk. _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Григорий Григоренко
On 26 Jul 2011, at 11:05am, Григорий Григоренко wrote: > This leads us to conclusion: index in SQLITE database if scattered and cannot be "jumped directly" to N-th element. SQLITE has to read it somehow consecutively. > > And so SQLITE has to read half of index (!) to find matching index record. I don't think it's SQLite itself that's reading half the index. I think it's some part of your operating system that's trying to cache all of your database file as SQLite reads lots of different parts of it spread about randomly. Unfortunately I don't see how an OS can reasonably do that since it will produce the slow speeds you're complaining about. Simon. _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
On Tue, Jul 26, 2011 at 2:18 PM, Simon Slavin <[hidden email]> wrote:
> > On 26 Jul 2011, at 11:05am, Григорий Григоренко wrote: > > > This leads us to conclusion: index in SQLITE database if scattered and > cannot be "jumped directly" to N-th element. SQLITE has to read it somehow > consecutively. > > > > And so SQLITE has to read half of index (!) to find matching index > record. > > I don't think it's SQLite itself that's reading half the index. I think > it's some part of your operating system that's trying to cache all of your > database file as SQLite reads lots of different parts of it spread about > randomly. Unfortunately I don't see how an OS can reasonably do that since > it will produce the slow speeds you're complaining about. > > actually I narrowed down the problem (cmiiw). The simple table CREATE TABLE [abctable] ([id] INTEGER PRIMARY KEY AUTOINCREMENT, a, b, c) add many records (100,000) INSERT INTO abctable (a, b, c) VALUES (10, 20, 30) Good variant CREATE INDEX idxabc ON abctable (a, b, c) SELECT * FROM abctable WHERE a=10 and b=20 and c > 1000000 Sqlite reads few data (3k actually for 100,000 records) to show empty result Another variant CREATE INDEX idxabid ON abctable (a, b, id) SELECT * FROM abctable WHERE a=10 and b=20 and id > 1000000 Sqlite reads much (1,7MB) Checked with 3.7.6.2, I suppose the latter due to some special meaning of the id/rowid, but I suppose the second case should work with fewer reads Max _______________________________________________ sqlite-users mailing list [hidden email] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
Free forum by Nabble | Edit this page |