LSM extension lsm_work

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

LSM extension lsm_work

Josu Diaz de Arcaya
Hi all,

I am very happy with the inclusion of LSM extension in sqlite3's code base.
I have a question regarding
the merging of segments within the database.

I noticed that autowork enabled was not enough for keeping disk consumption
at bay, the generated database
was N times larger than the generated by sqlite3. Using lsm_work(db,1-1,0)
after every transaction seems to
mitigate this and the disk footprint of the db gets back to normal. This
seem to be explained by Mr. Kennedy
here http://www.sqlite.org/src4/doc/tip/www/lsmusr.wiki#explicit_scheduling

* What does exactly do lsm_work(db,1,-1,0) ? My assumption is that it
merges all the on disk segments into one larger
segment, is this correct?

* Doing this after every transactions of course takes some tool on
performance and everything goes significantly slower. What
is the best way of running lsm_work? in a separate thread like it is
mentioned in sqlite4's documentation?
http://www.sqlite.org/src4/doc/tip/www/lsmusr.wiki#explicit_scheduling

* Finally, can I call lsm_work less aggressively in terms of parameters? I
am worried about virtual memory consumption so I'd rather not do
a full lsm_work every time, something like.. merge until there are no more
elements with delete markers on it?

Thanks!
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LSM extension lsm_work

Dan Kennedy-4
On 12/25/2017 11:17 PM, Josu Diaz de Arcaya wrote:
> Hi all,
>
> I am very happy with the inclusion of LSM extension in sqlite3's code base.
> I have a question regarding
> the merging of segments within the database.
>
> I noticed that autowork enabled was not enough for keeping disk consumption
> at bay, the generated database
> was N times larger than the generated by sqlite3.

If you're doing lots of operations on the same key - i.e. deleting and
recreating the entries, or updating them often, then an LSM database can
end up much larger than the equivalent b-tree - basically because it is
so lazy about deleting old keys. Is your workload like that?

> Using lsm_work(db,1-1,0)
> after every transaction seems to
> mitigate this and the disk footprint of the db gets back to normal. This
> seem to be explained by Mr. Kennedy
> here http://www.sqlite.org/src4/doc/tip/www/lsmusr.wiki#explicit_scheduling
>
> * What does exactly do lsm_work(db,1,-1,0) ? My assumption is that it
> merges all the on disk segments into one larger
> segment, is this correct?

That's correct. Calling lsm_work(db, 1, -1, 0) merges all segments into
one. i.e. it rewrites the entire database to be a single optimized
b-tree. Calling it after every transaction seems like it would make the
system much less efficient than a regular b-tree database.

> * Doing this after every transactions of course takes some tool on
> performance and everything goes significantly slower. What
> is the best way of running lsm_work? in a separate thread like it is
> mentioned in sqlite4's documentation?
> http://www.sqlite.org/src4/doc/tip/www/lsmusr.wiki#explicit_scheduling

Using a background thread for merges is how Leveldb & co. work. It's not
a bad idea I think.


> * Finally, can I call lsm_work less aggressively in terms of parameters? I
> am worried about virtual memory consumption so I'd rather not do
> a full lsm_work every time, something like.. merge until there are no more
> elements with delete markers on it?

I don't think there is a way to do that. And I'm not sure it would be
substantially different from just optimizing the entire db anyway.

Setting LSM_CONFIG_AUTOMERGE to 2 might help a bit.

Dan.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LSM extension lsm_work

Josu Diaz de Arcaya
Hi,

Many thanks for the very detailed answer, it is of great help

2017-12-26 15:28 GMT+01:00 Dan Kennedy <[hidden email]>:

> On 12/25/2017 11:17 PM, Josu Diaz de Arcaya wrote:
>
>> Hi all,
>>
>> I am very happy with the inclusion of LSM extension in sqlite3's code
>> base.
>> I have a question regarding
>> the merging of segments within the database.
>>
>> I noticed that autowork enabled was not enough for keeping disk
>> consumption
>> at bay, the generated database
>> was N times larger than the generated by sqlite3.
>>
>
> If you're doing lots of operations on the same key - i.e. deleting and
> recreating the entries, or updating them often, then an LSM database can
> end up much larger than the equivalent b-tree - basically because it is so
> lazy about deleting old keys. Is your workload like that?


Yes, it is a general purpose  workload, I guess it makes perfect sense that
it grows larger than a b-tree database

One other thing that worries me is memory consumption. When running a
workload with sqlite3 memory grows to a certain point but stays more or
less stable for the entire duration of the workload, however, with lsm
memory starts being lower than sqlite3 but keeps growing with no apparent
ceiling (some GB), it is not until I close the connection with the database
that everything goes back to normal.

For more details, the workload is pretty intense, it opens the database and
executes thousands of operations (lsm_insert, lsm_delete, lsm_delete_range,
cursors...). It inserts data into the database to a certain point (~2
million entries) and then it deletes, and add data but without going beyond
that threshold.

I've tried running valgrind on it but it does not detect any memory leak,
it seems as closing the database puts memory back to normal.

Is there any know issue that fits this description?

Thanks!
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users