Bug: LSM1 lsm1BestIndex() always chooses table scan

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug: LSM1 lsm1BestIndex() always chooses table scan

James Kafader
Hi SQLite devs,

I'd like to begin with a "thank you" for a great tool that we use every day
at Internet Archive.

We are currently considering SQLite's LSM engine (we are aware it is not
considered production quality) to implement a index server process that
will underpin the Wayback Machine's URL replay.

We would like to use SQLite with the LSM1 module loaded to be able to write
efficient SQL to look up key ranges; we currently use a mechanism very
similar to this (but slower and more unweildy!) to serve this large index
out of plain text files.

In building the LSM1 module I followed charles leifer's instructions here:
https://www.charlesleifer.com/blog/lsm-key-value-storage-in-sqlite3/

resulting in this approximate set of commands:
wget http://sqlite.org/src/tarball/version-3.29.0/sqlite-3.29.tar.gz
tar zxvf sqlite-3.29.tar.gz
cd sqlite/ext/lsm1
CFLAGS="-g -fPIC" TCCX="gcc -g -fPIC" make lsm.so

we then load the lsm module that compiles into a version-compatible SQLite
3:
sqlite3
sqlite> .load /var/tmp/sqlite/ext/lsm1/lsm
sqlite> .timer on
sqlite> create virtual table cdx using lsm1 ('/var/tmp/cdx.lsm', key, TEXT,
value);
# here we load some data from a CSV file, appx 3m records for testing
purposes
# then we query out the last record from the loaded data
sqlite> select key, value from cdx where key =
'151,65,166,108)/media/newmuseum/images/0/51927_ca_object_representations_media_1_medium.jpg
20121001214108';
# this operation takes about 2s (!) and raised our curiosity. After a lot
of testing, it became clear that it's doing a full table scan, despite the
fact that the LSM implementation should be able to seek to this key.

We think, after trying out a fix that line 845 of lsm_vtab.c contains a
bug. This line
  for(i=0; i<pIdxInfo->nConstraint && idxNum<16; i++, pConstraint++){
should perhaps be replaced by this line:
   for(i=0; i<pIdxInfo->nConstraint && idxNum>16; i++, pConstraint++){

whether or not that's the correct fix, it seems to be something close to
logic involving idxNum there; it might be that it should be set to 0 to
begin with rather than 99.

Thanks again! We will report any other bugs we uncover.

Best,
James Kafader
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: LSM1 lsm1BestIndex() always chooses table scan

Richard Hipp-3
On 8/16/19, James Kafader <[hidden email]> wrote:
>
> We think, after trying out a fix that line 845 of lsm_vtab.c contains a
> bug. This line
>   for(i=0; i<pIdxInfo->nConstraint && idxNum<16; i++, pConstraint++){
> should perhaps be replaced by this line:
>    for(i=0; i<pIdxInfo->nConstraint && idxNum>16; i++, pConstraint++){
>

Thanks for debugging this.  We think the actual fix is:

   for(i=0; i<pIdxInfo->nConstraint; i++, pConstraint++){

In other words, take out the idxNum term completely.  I'm not sure
what that was about - probably some cruft left over from whatever
virtual table I used as a template when throwing together the LSM1
vtab.

The fix has now been checked into the SQLite source tree.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users