SELECT query first run is VERY slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
Thank you for explaining this.

I guess you're right about query planner deciding to avoid index usage based on stats.

22 июля 2011, 18:30 от Richard Hipp <[hidden email]>:

> On Fri, Jul 22, 2011 at 9:48 AM, Григорий Григоренко <[hidden email]>wrote:
>
> > >
> > > Please post the results of:
> > >
> > >     SELECT * FROM sqlite_stat1;
> > >
> > >
> >   tbl = log
> >  idx = idxlog_kind_computer_process_who_id_msg_created_at
> >  stat = 2815667 563134 563134 469278 74097 1 1 1
> >
>
> The first number on "stat" is the number of rows in the table.  The 2nd
> number is the average number of rows that have the same value for the first
> column of the index.  The 3rd number is the average number of rows that have
> the same value for the first 2 columns of the index.  And so forth.
>
> There are a huge number of rows that have the same value for the first 4
> terms of this index, which shows us that this is a really lousy index.
> Ideally, you want the second number in the "stat" column to be something
> small, like 10.
>
> If you do not run ANALYZE, SQLite has no way of knowing that the index is
> mostly useless.  SQLite assumes that the index is a good one, and that the
> 2nd integer in "stat" is 10.  And it therefore tries to use the index.  But
> since the index is so bad, the resulting performance is slow.
>
> After running ANALYZE, SQLite realizes that the index is lousy and avoids
> using it.  Hence, performance is much better.
>
>
> >
> >  tbl = log
> >  idx = idxlog_kind_computer_process_id_who_msg_created_at
> >  stat = 2815667 563134 563134 469278 1 1 1 1
> >
> >  tbl = log
> >  idx = idxlog_kind_computer_id_process_who_msg_created_at
> >  stat = 2815667 563134 563134 1 1 1 1 1
> >
> >  tbl = log
> >  idx = idxlog_kind_id_computer_process_who_msg_created_at
> >  stat = 2815667 563134 1 1 1 1 1 1
> >
> >  tbl = log
> >  idx = idxlog_kind_computer_process_who_msg_created_at
> >  stat = 2815667 563134 563134 469278 74097 2 2
> >
> >  tbl = log
> >  idx = idxlog_created_at
> >  stat = 2815667 106
> >
>
>
>
> --
> D. Richard Hipp
> [hidden email]
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
In reply to this post by Tito Ciuro
>
> Perhaps my post dated Aug. 19, 2009 will help a little bit:
>
> http://www.cocoabuilder.com/archive/cocoa/242954-core-data-dog-slow-when-using-first-time-after-boot.html
>
> -- Tito
>

Thanks for sharing.  "warming file" is a way to cache whole database as I understand it.

After everything is cached scattered reading from database runs faster.

Unfortunately, in my case base size is ~ 3.5 Gb; it's too big for this strategy.

Even if I read at 25 Mb/s rate it will took 3500 / 25 = ~140 seconds just to read whole db file.

And what's more important I've only 2 Gb of RAM.

Anyway, thanks for sharing. I guess these cases are similar.


To me problem looks like this:

SQLITE needs to read  (cache) from db a lot (too much?) while first-time query execution even if a query uses nicely matched index and returns nothing.

And SQLITE is doing lot's of scattered readings during query execution; not trying to somehow batch read or similar. That's why file caching helps.

If it's true not sure there's a simple and nice solution.

I'll try some ideas (including normalization) and report results in this topic next week.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Tito Ciuro
Hi,

It has worked fairly well with small databases, but I see the problem with medium to large files. Have you tried to run ANALYZE on your database? I'm curious to know how long it takes.

-- Tito

On Jul 24, 2011, at 8:26 AM, Григорий Григоренко wrote:

>>
>> Perhaps my post dated Aug. 19, 2009 will help a little bit:
>>
>> http://www.cocoabuilder.com/archive/cocoa/242954-core-data-dog-slow-when-using-first-time-after-boot.html
>>
>> -- Tito
>>
>
> Thanks for sharing.  "warming file" is a way to cache whole database as I understand it.
>
> After everything is cached scattered reading from database runs faster.
>
> Unfortunately, in my case base size is ~ 3.5 Gb; it's too big for this strategy.
>
> Even if I read at 25 Mb/s rate it will took 3500 / 25 = ~140 seconds just to read whole db file.
>
> And what's more important I've only 2 Gb of RAM.
>
> Anyway, thanks for sharing. I guess these cases are similar.
>
>
> To me problem looks like this:
>
> SQLITE needs to read  (cache) from db a lot (too much?) while first-time query execution even if a query uses nicely matched index and returns nothing.
>
> And SQLITE is doing lot's of scattered readings during query execution; not trying to somehow batch read or similar. That's why file caching helps.
>
> If it's true not sure there's a simple and nice solution.
>
> I'll try some ideas (including normalization) and report results in this topic next week.
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
ANALYZE lasted for ~ 15 minutes.


24 июля 2011, 17:21 от Tito Ciuro <[hidden email]>:
> Hi,
>
> It has worked fairly well with small databases, but I see the problem with medium to large files. Have you tried to run ANALYZE on your database? I'm curious to know how long it takes.
>
> -- Tito
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
In reply to this post by Григорий Григоренко
Okay, here are some steps I've done.

1) normalize db;

I've created single table (item). "computer","process" and "who" fields in log table became rowid integers pointing to this table.
"kind" is still a short string.

2) give up covering indexes;

For those not aware (http://www.sqlite.org/queryplanner.html , 1.7 Covering Indices) it's an index that has additional columns at it's end ( being selected by query).
Covering index eliminate the need to read data from db records. But it increases size of index and size of database.

3) use INDEXED BY to suggest index to query planner.

Since indexes are created exactly for specific query.


Software was working during weekend.

Yesterday's database has 6 mln records and it's only 1.1 Gb in size.

Comparing to 4 mln records and 3.5 Gb size before.

Now, the long running query took 27 seconds and it has read 50 Mb from database (compare to 2 minutes and 307 Mb before).

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT :Re: SELECT query first run is VERY slow

Black, Michael (IS)
An 8X speedup is a pretty good achievement...congrats...



#1 I take it your query is CPU bound the first time?

#2 Can you show us the query planner please?

#3 Can you show us the query planner minus the "INDEXED BY"?

#4 Can you show us sqlite_stat1?

#5 Can you show us your tables now?

#6 What happens if you do "pragma cache_size=15000"?  I'd make cache_size = (readbytes/8192)*1.1 at least.

#7 Care to post your database again?



It still seems to me this should run faster the first time unless you have a really slow disk system or sqlite is doing something silly (which I doubt).

Perhaps the cache being to small is hurting things.



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]]
Sent: Monday, July 25, 2011 3:30 AM
To: sqlite-users
Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow

Okay, here are some steps I've done.

1) normalize db;

I've created single table (item). "computer","process" and "who" fields in log table became rowid integers pointing to this table.
"kind" is still a short string.

2) give up covering indexes;

For those not aware (<thismessage:/>http://www.sqlite.org/queryplanner.html , 1.7 Covering Indices) it's an index that has additional columns at it's end ( being selected by query).
Covering index eliminate the need to read data from db records. But it increases size of index and size of database.

3) use INDEXED BY to suggest index to query planner.

Since indexes are created exactly for specific query.


Software was working during weekend.

Yesterday's database has 6 mln records and it's only 1.1 Gb in size.

Comparing to 4 mln records and 3.5 Gb size before.

Now, the long running query took 27 seconds and it has read 50 Mb from database (compare to 2 minutes and 307 Mb before).

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT :Re: SELECT query first run is VERY slow

Григорий Григоренко
>
> #1 I take it your query is CPU bound the first time?

Hmm, not sure. CPU was still under 1 % usage.

>
> #2 Can you show us the query planner please?
> #3 Can you show us the query planner minus the "INDEXED BY"?

Sure. Executing:

SELECT * FROM log INDEXED BY idxlog_kind_computer  WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100;

Query plan:

0|0|0|SEARCH TABLE log USING INDEX idxlog_kind_computer (kind=? AND computer=?) (~3 rows)

Executing:

SELECT * FROM log WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100;

Query plan:

0|0|0|SEARCH TABLE log USING INDEX idxlog_kind_computer_process_who (kind=? AND computer=?) (~3 rows)


Query planner has selected idxlog_kind_computer_process_who index.



>
>

> #4 Can you show us sqlite_stat1?

Actually, there isn't any. I've not ran ANALYZE on this database.
Since I'm using INDEXED BY there's no need in query planner decisions.

>
> #5 Can you show us your tables now?
>

You mean db schema?

// table with computer, process and who values
CREATE TABLE item(name text,value text);
CREATE UNIQUE INDEX idxitem_name_value ON item(name,value);

// log-table
CREATE TABLE log(id integer primary key autoincrement,msg text,created_at int,kind,computer,process,who);

CREATE INDEX idxlog_created_at ON log(created_at);
CREATE INDEX idxlog_kind ON log(kind);
CREATE INDEX idxlog_kind_computer ON log(kind,computer);
CREATE INDEX idxlog_kind_computer_process ON log(kind,computer,process);
CREATE INDEX idxlog_kind_computer_process_who ON log(kind,computer,process,who);

Since I've normalized data selecting (in real-life application) looks like:
SELECT ... kind,msg,c.value as computer,p.value as process,w.value as who,created_at,id FROM log ...
 LEFT JOIN item c ON c.rowid=log.computer
 LEFT JOIN item p ON p.rowid=log.process
 LEFT JOIN item w ON w.rowid=log.who
...



> #6 What happens if you do "pragma cache_size=15000"?  I'd make cache_size = (readbytes/8192)*1.1 at least.

(readbytes/8192)*1.1 gives us ~ 6700 pages. Tried "pragma cache_size=15000;" and nothing changed.

Sqlite3 still has read ~ 50 Mb and spent 27 seconds on first query run.



>
> #7 Care to post your database again?
>

http://dl.dropbox.com/u/2168777/db2.rar

>
>
> It still seems to me this should run faster the first time unless you have a really slow disk system or sqlite is doing something silly (which I doubt).
>
> Perhaps the cache being to small is hurting things.

I'm not  sure cache settings are important.

You see, I've tried running query with different cache_size. Even with cache_size=100.
Tried cache_size from 100 to 25000 and monitored memory usage of console process to make sure
it was using only the amount of memory given by cache.

Timing is _always_ ~ 1 second on subsequent (not first) runs. You can try it yourself with attached data.







>
>
>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
>
> ________________________________
> From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]]
> Sent: Monday, July 25, 2011 3:30 AM
> To: sqlite-users
> Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow
>
> Okay, here are some steps I've done.
>
> 1) normalize db;
>
> I've created single table (item). "computer","process" and "who" fields in log table became rowid integers pointing to this table.
> "kind" is still a short string.
>
> 2) give up covering indexes;
>
> For those not aware (<thismessage:/>http://www.sqlite.org/queryplanner.html , 1.7 Covering Indices) it's an index that has additional columns at it's end ( being selected by query).
> Covering index eliminate the need to read data from db records. But it increases size of index and size of database.
>
> 3) use INDEXED BY to suggest index to query planner.
>
> Since indexes are created exactly for specific query.
>
>
> Software was working during weekend.
>
> Yesterday's database has 6 mln records and it's only 1.1 Gb in size.
>
> Comparing to 4 mln records and 3.5 Gb size before.
>
> Now, the long running query took 27 seconds and it has read 50 Mb from database (compare to 2 minutes and 307 Mb before).
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
In reply to this post by Григорий Григоренко
I think I narrowed the problem a bit.  Guys, hope I'm not bothering you too much :)


I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index.

Database file increased by 105 Mb (and sqlite3 process counter shows that there were ~105 Mb written to disk).


This means that index on log(kind, computer) takes 105 Mb of database file (and whole size of database is 1259 Mb).


Now, I'm running query which is using this index (and is not returning any data) and monitor that sqlite3 process reads ~50 Mb.


So there are  two major problems here.

1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query).

If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all.

2) SQLITE is reading abnormally slowly during this first-time running query  (waiting for something a lot?).

During index creation I monitored sqlite3 process and it was consuming CPU at ~20% rate and it's doing I/O at ~10 Mb per second rate.
That's what I call "normal load"!














_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Luuk
On 25-07-2011 15:45, Григорий Григоренко wrote:
> I think I narrowed the problem a bit.  Guys, hope I'm not bothering you too much :)
>
>
> I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index.


I thought that, if you have an index on a,b,c,d
than you should not have an index on a,b,c too
because if you use those 3 field in the where-clause, use can be made of
the 4-field index

Therefore i think you can do:
DROP INDEX idxlog_kind;
DROP INDEX idxlog_kind_computer;
DROP INDEX idxlog_kind_computer_process;


It drops your db-size, and a 'normal' select will make use of the
idxlog_kind_computer_process_who index.


C:\TEMP>sqlite3 2011-07-24.dblite   0<sql.txt
0|0|TABLE log WITH INDEX idxlog_kind_computer_process_who
Seconds elapsed: 1

C:\TEMP>run.cmd

C:\TEMP>sqlite3 2011-07-24.dblite   0<sql.txt
0|0|TABLE log WITH INDEX idxlog_kind_computer_process_who
Seconds elapsed: 2


--
Luuk
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Black, Michael (IS)
In reply to this post by Григорий Григоренко
You need to normalize your "kind" value.



.pragma cache_size=15000;

drop index idxlog_kind_computer;

create table kind(id integer,kind text);
insert into kind values(1,'debug');
insert into kind values(2,'error');
insert into kind values(3,'info');
insert into kind values(4,'timing');
insert into kind values(5,'warn');
update log set kind=1 where kind='debug';
update log set kind=2 where kind='error';
update log set kind=3 where kind='info';
update log set kind=4 where kind='timing';
update log set kind=5 where kind='warn';
create index idxlog_kind_computer ON log(kind,computer);



Then see how long your first query takes.



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]]
Sent: Monday, July 25, 2011 8:45 AM
To: [hidden email]
Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow

I think I narrowed the problem a bit.  Guys, hope I'm not bothering you too much :)


I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index.

Database file increased by 105 Mb (and sqlite3 process counter shows that there were ~105 Mb written to disk).


This means that index on log(kind, computer) takes 105 Mb of database file (and whole size of database is 1259 Mb).


Now, I'm running query which is using this index (and is not returning any data) and monitor that sqlite3 process reads ~50 Mb.


So there are  two major problems here.

1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query).

If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all.

2) SQLITE is reading abnormally slowly during this first-time running query  (waiting for something a lot?).

During index creation I monitored sqlite3 process and it was consuming CPU at ~20% rate and it's doing I/O at ~10 Mb per second rate.
That's what I call "normal load"!














_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Max Vlasov
In reply to this post by Григорий Григоренко
On Mon, Jul 25, 2011 at 5:45 PM, Григорий Григоренко <[hidden email]> wrote:
>
> 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query).
>
> If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all.
>

Please, post your query. To understand whether sqlite reads too much
or not it's better to know what exactly you want to select.


> 2) SQLITE is reading abnormally slowly during this first-time running query  (waiting for something a lot?).
>

Is this with the recreated index or still the one that was created
during the lifetime of your program?

Max
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

New madIS v1.3 release

Eleytherios Stamatogiannakis
madIS is an extensible relational database system built upon the SQLite
database and with extensions written in Python (via APSW SQLite
wrapper). Its is developed at:

http://madis.googlecode.com

Due to madIS’ SQLite core, the database format of madIS is exactly the
same as SQLite’s one. This means that all SQLite database files are
directly usable with madIS.

In usage, madIS, feels like a lightweight personal Hive+Hadoop
programming environment, without the distributed processing capabilities
of Hadoop. Nevertheless due to its low overhead while running on a
single computer (compared to Hadoop), madIS can easily handle tens of
millions of rows on a single desktop/laptop computer.

In version 1.3 of madIS:

  - A great deal of testing has been done on Linux, Windows and Mac OSX.
  - madIS now only depends on APSW on all these systems, so it is easier
to be installed
  - XMLPARSE was added. XMLPARSE processes its input in a streaming
fashion, and has been tested with very large (~20+ GB) XML source files
without problems.
  - JPACK functions were added. Jpacks are now the preferable way to
store a set of values into a single tuple. For easy viewing and
exporting of the jpacks, their format was based on the JSON format.
  - Heavy testing under Windows and Mac OSX. CLIPBOARD and CLIPOUT
virtual tables work under all OSes.
  - CLIPOUT and CLIPBOARD, have been tested with Excel, Libre/Open
Office Calc, and iWork Numbers.
  - Functions that return tables, can easily be coded now, by using
Python's generators (yield)
  - A lot of completions (via TAB) have been added to mterm. Mterm's
completion engine can automatically complete, tables, column names,
table index names and database filenames in attach database.

In detail:

MTERM changes:

While using madIS's terminal (mterm), mterm completes (via TAB) column
names, tables names, etc. of the opened and attached databases.

Also by default mterm colours column separators and if more than 4
columns are returned, mterm "tags" the columns with numbers:

mterm> select * from deficit;
[1|1 | People's Republic of China [3|272.500 |2010
[1|2 | Japan [3|166.500 |2010
[1|3 | Germany [3|162.300 |2010
--- Column names ---
[1|Rank [2|Country [3|CAB [4|Year

  Note: In mterm the column number tags are coloured red in above example


XMLPARSE:

If i wished to retrieve the date and author of madIS project's Source
Changes ATOM feed:

mterm> select * from
(XMLPARSE
'<entry><updated>t</updated><author><name>t</name></author></entry>'
select * from file('http://code.google.com/feeds/p/madis/hgchanges/basic')
) limit 3;
2011-07-25T14:07:07Z|est...@servum
2011-07-25T14:04:09Z|est...@servum
2011-07-22T14:08:11Z|est...@servum
--- Column names ---
[1|updated [2|author_name
Query executed in 0 min. 0 sec 543 msec
mterm>

In above query, XMLPARSE is used in an "inverted form" which is easier
to write, when chaining virtual tables.


JPACKS:

Frequently, the need to store multiple values into a tuple arises while
processing data. Previously in madIS a lot of formats were used to store
all these multiple values (space separated, comma separated, tab
separated). Now JPACKs are the recommended way to store multiple values.

JPACKs are based on the JSON format, with the exception that a JPACK of
a single value is itself. Some examples are presented below:

mterm> select jpack('a');
a

mterm> select jpack('a', 'b');
["a","b"]

mterm> select jsplit(jpack('a','b','c'));
a|b|c

mterm> select jsplitv(jpack('a','b','c'));
a
b
c

mterm> select jgroup(c1) from (select 'a' as c1 union select 'b');
["a","b"]


CLIPBOARD and CLIPOUT:

If a selection of data has been made in the web browser or a spreadsheet
program then by executing the following in mterm we could access the
clipboard data as a table:

mterm> select * from CLIPBOARD();
Rank↓ |Country↓ |CAB (billion US dollars)↓ |Year↓
1 | People's Republic of China |272.500 |2010
2 | Japan |166.500 |2010
3 | Germany |162.300 |2010
--- Column names ---
[1|C1 [2|C2 [3|C3 [4|C4
Query executed in 0 min. 0 sec 204 msec

* For above data i selected the top 3 rows while browsing
http://en.wikipedia.org/wiki/List_of_sovereign_states_by_current_account_balance 
.

Similarly if i wished to process and send the data to a spreadsheet
program then i could execute the following in mterm:

mterm> CLIPOUT select * from deficit;
1
--- Column names ---
[1|return_value
Query executed in 0 min. 0 sec 111 msec

and paste in a spreadsheet program.

Both CLIPBOARD and CLIPOUT are virtual tables. CLIPOUT is used in an
"inverted form" which is easier to write, when chaining queries.

-- Lefteris
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: New madIS v1.3 release

Alexey Pechnikov-2
Why JPack? May be Tcl lists will be more useful? The tcl dictionary
(also known as associative array) can be stored as list too.

    SELECT TCLCMD('dict', 'get', 'key 1 mykey 2', 'mykey');
    2

    SELECT TCLCMD('lindex', 'key 1 mykey 2', 0);
    key

    SELECT TCLCMD('join', 'key 1 mykey 2', '-');
    key-1-mykey-2

    SELECT TCLCMD('lreplace', 'key 1 mykey 2', -1, -1, 'test');
    test key 1 mykey 2

    SELECT TCLCMD('lreplace', 'key 1 mykey 2', 'end', 'end', 'test');
    key 1 mykey test

    SELECT TCLCMD('linsert', 'a b', 0, 'c');
    c a b

    SELECT TCLCMD('linsert', 'a b', 'end', 'c');
    a b c

    SELECT TCLCMD('lsort', 'a c b');
    a b c

    SELECT TCLCMD('lsort', '-decreasing', 'a c b');
    c b a

    SELECT TCLCMD('lreverse', 'a c b');
    b c a

    SELECT TCLCMD('lsearch', 'a c b', 'b');
    2

    SELECT TCLCMD('lsearch', 'a c b', 'd');
    -1

See details here: http://sqlite.mobigroup.ru/wiki?name=ext_tcl

--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
In reply to this post by Luuk
>
> I thought that, if you have an index on a,b,c,d
> than you should not have an index on a,b,c too
> because if you use those 3 field in the where-clause, use can be made of
> the 4-field index
>

I'm not sure. Let me explain.

I need query to be ORDER BY id DESC. I've dropped this ORDER BY to simplify my case. But in real-life app I need it.

So, index on (kind, computer) has these index records:

[ KIND ] [ COMPUTER ] [ ID ]
...

They are ordered by kind, computer and then by id. So ORDER BY id DESC comes free.
Query planner just need to retrieve records from subset of index records starting at last one backwards.

Index on (kind,computer, process,who) has these index records:

[ KIND ] [ COMPUTER ] [ PROCESS ] [ WHO ] [ ID ]

This time having found subset of index records query planner cannot start retrieving them from the last to first.
It has to sort them by ID at first. Isn't it?


Maybe I'm not getting it right?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
In reply to this post by Григорий Григоренко
> >
> > 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query).
> >
> > If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all.
> >
>
> Please, post your query. To understand whether sqlite reads too much
> or not it's better to know what exactly you want to select.
>


This is script I ran:

CREATE TABLE IF NOT EXISTS T(t); DELETE FROM T; INSERT INTO T VALUES( strftime('%s', 'now' ) );

EXPLAIN QUERY PLAN SELECT * FROM log  INDEXED BY idxlog_kind_computer WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100;

SELECT * FROM log INDEXED BY idxlog_kind_computer WHERE kind = 'info' AND computer=1 and id > 7070636 LIMIT 100;

SELECT "Seconds elapsed: " || (strftime('%s', 'now' ) - t) FROM T ;



>
> > 2) SQLITE is reading abnormally slowly during this first-time running query  (waiting for something a lot?).
> >
>
> Is this with the recreated index or still the one that was created
> during the lifetime of your program?
>


It doesn't matter. Tried it with old and with recreated index, same behaviour.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Simon Slavin-3
In reply to this post by Григорий Григоренко

On 26 Jul 2011, at 9:30am, Григорий Григоренко wrote:

> I need query to be ORDER BY id DESC. I've dropped this ORDER BY to simplify my case. But in real-life app I need it.
>
> So, index on (kind, computer) has these index records:
>
> [ KIND ] [ COMPUTER ] [ ID ]

I don't know that SQLite does an inherent addition of the 'id' column to all INDEXes.  Some index implementations do this because their algorithm requires that all positions in an index are unique, but I don't think SQLite works like that.  I think that if you want 'id' to be part of an index you have to say that in the definition.

Nevertheless this still doesn't explain why your first run of a query is so much slower than subsequent runs.  SQLite's cache size is set here:

http://www.sqlite.org/pragma.html#pragma_cache_size

If the cache size described there doesn't explain the behaviour you're seeing, the problem isn't with SQLite, it's in your application or the OS or your hardware.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
In reply to this post by Black, Michael (IS)
Did that. Timings has decreased.  As I understand it it's about decreasing index size (that includes kind column).



To me the problem is still there. If my database will have 10 mln log records first running query will stuck again :(



I don't understand SQLITE strategy. Let me explain.

Index is an array of index records. They fill pages in database.

Searching with B-Tree index is similar to binary search in ordered array, isn't it? You pick record in a middle of array subset and compare to conditional value.

This step let you drop half of index subset from search.

Let's say size of index is 100 Mb and it contains 4 mln index records.

This is 100 Mb / 8 Kb (size of page) ~ 12000 pages.

While doing binary search for 4 mln records we do 22 compares at most (2^22 ~ 4 mln).

Assume worst case - all of these comparings use different pages.

So, we need to read 22 pages = 180 Kb.

Surely there's additional data to be read for index. Like some intermediate nodes in B-Tree.

Let's triple the number of pages, 66 pages = 540 Kb.


But SQLITE reads in this case ~ 50 Mb!!


This leads us to conclusion: index in SQLITE database if scattered and cannot be "jumped directly" to N-th element. SQLITE has to read it somehow consecutively.

And so SQLITE has to read half of index (!) to find matching index record.


Am I getting it right?






25 июля 2011, 19:35 от "Black, Michael (IS)" <[hidden email]>:

> You need to normalize your "kind" value.
>
>
>
> .pragma cache_size=15000;
>
> drop index idxlog_kind_computer;
>
> create table kind(id integer,kind text);
> insert into kind values(1,'debug');
> insert into kind values(2,'error');
> insert into kind values(3,'info');
> insert into kind values(4,'timing');
> insert into kind values(5,'warn');
> update log set kind=1 where kind='debug';
> update log set kind=2 where kind='error';
> update log set kind=3 where kind='info';
> update log set kind=4 where kind='timing';
> update log set kind=5 where kind='warn';
> create index idxlog_kind_computer ON log(kind,computer);
>
>
>
> Then see how long your first query takes.
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
>
> ________________________________
> From: [hidden email] [[hidden email]] on behalf of Григорий Григоренко [[hidden email]]
> Sent: Monday, July 25, 2011 8:45 AM
> To: [hidden email]
> Subject: EXT :Re: [sqlite]SELECT query first run is VERY slow
>
> I think I narrowed the problem a bit.  Guys, hope I'm not bothering you too much :)
>
>
> I've calculated size of index (it is index on log (kind,computer) ) of its own: dropped index, run VACUUM and re-created index.
>
> Database file increased by 105 Mb (and sqlite3 process counter shows that there were ~105 Mb written to disk).
>
>
> This means that index on log(kind, computer) takes 105 Mb of database file (and whole size of database is 1259 Mb).
>
>
> Now, I'm running query which is using this index (and is not returning any data) and monitor that sqlite3 process reads ~50 Mb.
>
>
> So there are  two major problems here.
>
> 1) SQLITE has to read about _half of index_ before it can use it (and understand there are no records matching query).
>
> If cache is enough to hold 50 Mb then on subsuquent queries sqlite process is not reading at all.
>
> 2) SQLITE is reading abnormally slowly during this first-time running query  (waiting for something a lot?).
>
> During index creation I monitored sqlite3 process and it was consuming CPU at ~20% rate and it's doing I/O at ~10 Mb per second rate.
> That's what I call "normal load"!
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Григорий Григоренко
In reply to this post by Simon Slavin-3
>
> > I need query to be ORDER BY id DESC. I've dropped this ORDER BY to simplify my case. But in real-life app I need it.
> >
> > So, index on (kind, computer) has these index records:
> >
> > [ KIND ] [ COMPUTER ] [ ID ]
>
> I don't know that SQLite does an inherent addition of the 'id' column to all INDEXes.  Some index implementations do this because their algorithm requires that all positions in an index are unique, but I don't think SQLite works like that.  I think that if you want 'id' to be part of an index you have to say that in the definition.

SQLITE always stores rowid in every index record. This rowid let SQLITE find approriate database record. You can read more here: http://www.sqlite.org/queryplanner.html 

>
> Nevertheless this still doesn't explain why your first run of a query is so much slower than subsequent runs.  SQLite's cache size is set here:
>
> http://www.sqlite.org/pragma.html#pragma_cache_size
>
> If the cache size described there doesn't explain the behaviour you're seeing, the problem isn't with SQLite, it's in your application or the OS or your hardware.


The problem is filling the cache. SQLITE wants too much data to execute simple query.

Cache size of SQLITE doesn't matter because OS does it's own caching. After query has been run for a first time OS caches disk pages.

And subsequent queries immediately gets data from RAM not from disk.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Simon Slavin-3
In reply to this post by Григорий Григоренко

On 26 Jul 2011, at 11:05am, Григорий Григоренко wrote:

> This leads us to conclusion: index in SQLITE database if scattered and cannot be "jumped directly" to N-th element. SQLITE has to read it somehow consecutively.
>
> And so SQLITE has to read half of index (!) to find matching index record.

I don't think it's SQLite itself that's reading half the index.  I think it's some part of your operating system that's trying to cache all of your database file as SQLite reads lots of different parts of it spread about randomly.  Unfortunately I don't see how an OS can reasonably do that since it will produce the slow speeds you're complaining about.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT query first run is VERY slow

Max Vlasov
On Tue, Jul 26, 2011 at 2:18 PM, Simon Slavin <[hidden email]> wrote:

>
> On 26 Jul 2011, at 11:05am, Григорий Григоренко wrote:
>
> > This leads us to conclusion: index in SQLITE database if scattered and
> cannot be "jumped directly" to N-th element. SQLITE has to read it somehow
> consecutively.
> >
> > And so SQLITE has to read half of index (!) to find matching index
> record.
>
> I don't think it's SQLite itself that's reading half the index.  I think
> it's some part of your operating system that's trying to cache all of your
> database file as SQLite reads lots of different parts of it spread about
> randomly.  Unfortunately I don't see how an OS can reasonably do that since
> it will produce the slow speeds you're complaining about.
>
>
Simon,

actually I narrowed down the problem  (cmiiw).

The simple table

CREATE TABLE [abctable] ([id] INTEGER PRIMARY KEY AUTOINCREMENT, a, b, c)

add many records (100,000)
INSERT INTO abctable (a, b, c) VALUES (10, 20, 30)

Good variant
CREATE INDEX idxabc ON abctable (a, b, c)
SELECT * FROM abctable WHERE a=10 and b=20 and c > 1000000
Sqlite reads few data (3k actually for 100,000 records) to show empty result

Another variant
CREATE INDEX idxabid ON abctable (a, b, id)
SELECT * FROM abctable WHERE a=10 and b=20 and id > 1000000

Sqlite reads much (1,7MB)

Checked with 3.7.6.2,
I suppose the latter due to some special meaning of the id/rowid, but I
suppose the second case should work with fewer reads

Max
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
1234