SQLite for datalogging - best practices

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Richard Hipp-3
On 10/30/18, Mark Wagner <[hidden email]> wrote:

> Going back to the comments from Dr. Hipp regarding WAL vs DELETE mode on
> F2FS devices, I just wanted to confirm my understanding.
>
> Given a device with F2FS and with sqlite compiled with
> SQLITE_ENABLE_BATCH_ATOMIC_WRITE, writes with DELETE mode will be
> considerably faster than with WAL mode.
>
> But a relatively long lived transaction that contains a significant amount
> of computation with lots of reads and writes would still block reads on
> other threads.  So WAL could still be the better choice in some
> circumstances -- even with F2FS since reads can be happening in parallel.
> Am I missing something?

It sounds to me like you understand it.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Gerlando Falauto
Please forgive my insistence -- are both those ideas really, really, stupid?

I understand SQLite is perfectly capable of handling huge database files
without any issues.
Yet I'm convinced there may be some corner cases where there might be
legitimate reasons for wanting partitioning (like this flight recorder mode
of mine).
I guess there might be obvious reasons for NOT doing so, which I however
fail to see at present -- any hints would be highly appreciated.

Thanks!
Gerlando

On Mon, Oct 29, 2018 at 9:50 PM Gerlando Falauto <[hidden email]>
wrote:

> On Mon, Oct 29, 2018 at 6:38 PM Keith Medcalf <[hidden email]> wrote:
>
>>
>> See the ext/misc/unionvtab.c extension for "reading" a bunch of databases
>> as if they were a single database.
>>
>> https://www.sqlite.org/src/artifact/0b3173f69b8899da
>
>
> Cool, indeed.
> I also had a look at the CSV file extension:
> https://www.sqlite.org/src/artifact?udc=1&ln=on&name=65297bcce8d5acd5
> Someone has actually come up with an extension to read Parquet files:
> https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html
>
> (...taking a deep breath...) Alright, I'm just gonna say it.
> How {hard, stupid, useful} do you guys think it would be to write an
> SQLite extension to add directory-based partitioning on top of the CSV
> extension and let the OS and filesystem take care of the rest?
> Something like "day=2018-10-29\source=source_a\bucket1.csv".
> I've heard people call it "hive-style" partitioning.
> As long as the size of each individual file remains reasonable, I might as
> well be happy with CSV files and just read the whole file sequentially.
>
> Alternatively, the same partitioning approach might be added to unionvtab,
> whatever feels simpler.
> I have no idea how the partitioning semantics could be specified though.
>
> I know, that kinda brings us back to my original question at the end of
> July about database sharding:
>
> https://www.mail-archive.com/sqlite-users@.../msg111250.html
> Perhaps I'm just too biased towards this approach.
>
> Thank you so much for your patience guys!
> Gerlando
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Richard Hipp-3
On 10/30/18, Gerlando Falauto <[hidden email]> wrote:
>
> I understand SQLite is perfectly capable of handling huge database files
> without any issues.
> Yet I'm convinced there may be some corner cases where there might be
> legitimate reasons for wanting partitioning (like this flight recorder mode
> of mine).
> I guess there might be obvious reasons for NOT doing so, which I however
> fail to see at present -- any hints would be highly appreciated.

You can use ATTACH to successively attach a new database each day (or
some other interval) and DETACH older database files.  You would have
multiple database files in play at any given moment, and you would
need to query across all of them using a UNION ALL query, or something
similar.  Partitioning would probably do about the same thing, just
with more convenient syntax.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Gerlando Falauto

Based on the block erase rate required to maintain I/O churn using the following database schema and assumptions:

- SSD size is 32 GB
- erase size is 2 MB
- 3000 erase cycles per block (49,152,000 erase operations total)

-- data returned from the wal_checkpoint operations
create table Tracer
(
  timestamp integer primary key,
  status integer not null,
  walpages integer not null,
  copypages integer not null
);

-- Tags table
create table Tags
(
  id integer primary key,
  source text collate nocase unique
);

-- History table
create table History
(
  id integer primary key,
  sid integer not null references Tags,
  timestamp integer not null,
  dtype text collate nocase,
  data,
  unique (timestamp, sid),
  unique (sid, timestamp)
);

-- operating parameters
pragma cache_size=262133; -- the bigger the better so that page changes do not need to be spilled until commit
pragma journal_mode=wal;
pragma wal_autocheckpoint=262144; -- large so that checkpointing is only a manual operation

with the following operational parameters:
 - data arrives in "clusters" so that each second can be committed within a transaction
 - vector data once per second from each of 6 3-axis accelerometers containing an average 4KB data payload each
 - scalar arriving randomly (15 - 45 seconds, avg 30 seconds) from a random subset of 16 sources data payload is a double
 - ID in the History table is the integer seconds since the unix epoch << 16 + sid
- "old" data deletion is performed at the start of every transaction

The Tracer table has data written whenever a wal_checkpoint is done.  walpages is the number of pages in the wal file at checkpoint and copypages is the number of pages that were copied from the wal file into the main database file.  Basically, every page in the wal file must be overwritten (eventually) and every page written to the db file is a page that must be erased from the db file.  If you add the two together, you get the number of pages that have been written (more or less) and have to be eventually erased.  There are 512 4K pages in a 2M erase block, so adding these up and dividing by 512 gives you a rough estimate of number of erases.  Given that we know how many erases we have available (based on the device size and an estimate of the number of erase operations per erase block) we can estimate how long the device will last until an erase fails and the device becomes useless.

How long do you need the device to last?  Based on the data collected so far I can estimate that the device will last quite a long time (a decade or more).  Of course, it will take a while for the database to reach steady-state ... to see if as it gets bigger the pagechange set per checkpoint increases much.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Gerlando Falauto
>Sent: Tuesday, 30 October, 2018 01:46
>To: SQLite mailing list
>Subject: Re: [sqlite] SQLite for datalogging - best practices
>
>On Tue, Oct 30, 2018 at 4:58 AM Keith Medcalf <[hidden email]>
>wrote:
>
>>
>> If you don't mind me asking, what sort of data are you collecting?
>> Are you the master (ie, scanning) or a slave (getting async data
>pushed to
>> you).
>> Are you "compressing" the returned data (storing only changes
>exceeding
>> the deadband) or are you storing every value (or is the source
>instrument
>> doing compression)?
>>
>I presume you need to store the TimeStamp, Point, Value and
>Confidence.
>> What is the data rate (# Points and Frequency)
>>
>
>The bulk of data consists of streams of AC signals being pushed from
>a
>handful of 3-axis accelerometers which are more or less synchronous.
>Data rate is in the order of a few hundreds samples/sec for each
>sensor.
>A first software layer handles buffering and passes one-second
>buffers to a
>second software layer which then saves it to the database for later
>analysis.
>Database schema currently consists of a single table with roughly the
>following columns: timestamp, sensorid (string), datatype (string)
>and a
>string containing the JSON encoding of those few hundred samples (as
>a JSON
>array).
>So each row takes up about 3-4KBs (~200 samples * ~5 bytes/samples *
>3 axes
>+ overhead).
>At a later stage one may want to pack together adjacent chunks into
>even
>longer strings (so to reduce the total number of rows) and/or store
>data in
>a more efficient manner (e.g. in binary or compressed form).
>I don't particularly like this way of storing logical streams (i.e.
>Time
>Series) in chunks but I could find any better way.
>
>There's also some way-less-frequent readings (scalar quantities being
>collected every 30 seconds or so) currently being stored in the same
>table.
>
>Any suggestion on how to improve this?
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
Hi Keith,

first of all let me tell you I cannot thank you enough for going through
all this trouble for me!

It did not occur to me it would help to run checkpoints manually.
There's a couple of things I don't understand though.

- After a checkpoint runs to completion (assuming no other process is
reading nor writing), what is the value of walpages (i.e. *pnLog)? Is it
the number of pages *originally* in the logfile (i.e. the same as
copypages), or is it the number of pages *remaining* in the logfile (i.e.
zero)? Also, does it refer to the number of *valid* pages or the number of
pages allocated by the file (i.e. fllesize divided by pagesize)?
- How often are you running checkpoints? Do you expect this variable to
have an impact on the end result?

Anyway, I'll definitely try the same approach on the current schema (which
is *way* different than your super-optimized version) just to see if it is
more or less consistent with the trend reported by smartmontools. If that's
the case, that makes for a much more convenient metric to estimate
performance and compare implementations.

So really, thanks a lot!
Gerlando


On Wed, Oct 31, 2018 at 4:46 AM Keith Medcalf <[hidden email]> wrote:

>
> Based on the block erase rate required to maintain I/O churn using the
> following database schema and assumptions:
>
> - SSD size is 32 GB
> - erase size is 2 MB
> - 3000 erase cycles per block (49,152,000 erase operations total)
>
> -- data returned from the wal_checkpoint operations
> create table Tracer
> (
>   timestamp integer primary key,
>   status integer not null,
>   walpages integer not null,
>   copypages integer not null
> );
>
> -- Tags table
> create table Tags
> (
>   id integer primary key,
>   source text collate nocase unique
> );
>
> -- History table
> create table History
> (
>   id integer primary key,
>   sid integer not null references Tags,
>   timestamp integer not null,
>   dtype text collate nocase,
>   data,
>   unique (timestamp, sid),
>   unique (sid, timestamp)
> );
>
> -- operating parameters
> pragma cache_size=262133; -- the bigger the better so that page changes do
> not need to be spilled until commit
> pragma journal_mode=wal;
> pragma wal_autocheckpoint=262144; -- large so that checkpointing is only a
> manual operation
>
> with the following operational parameters:
>  - data arrives in "clusters" so that each second can be committed within
> a transaction
>  - vector data once per second from each of 6 3-axis accelerometers
> containing an average 4KB data payload each
>  - scalar arriving randomly (15 - 45 seconds, avg 30 seconds) from a
> random subset of 16 sources data payload is a double
>  - ID in the History table is the integer seconds since the unix epoch <<
> 16 + sid
> - "old" data deletion is performed at the start of every transaction
>
> The Tracer table has data written whenever a wal_checkpoint is done.
> walpages is the number of pages in the wal file at checkpoint and copypages
> is the number of pages that were copied from the wal file into the main
> database file.  Basically, every page in the wal file must be overwritten
> (eventually) and every page written to the db file is a page that must be
> erased from the db file.  If you add the two together, you get the number
> of pages that have been written (more or less) and have to be eventually
> erased.  There are 512 4K pages in a 2M erase block, so adding these up and
> dividing by 512 gives you a rough estimate of number of erases.  Given that
> we know how many erases we have available (based on the device size and an
> estimate of the number of erase operations per erase block) we can estimate
> how long the device will last until an erase fails and the device becomes
> useless.
>
> How long do you need the device to last?  Based on the data collected so
> far I can estimate that the device will last quite a long time (a decade or
> more).  Of course, it will take a while for the database to reach
> steady-state ... to see if as it gets bigger the pagechange set per
> checkpoint increases much.
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >[hidden email]] On Behalf Of Gerlando Falauto
> >Sent: Tuesday, 30 October, 2018 01:46
> >To: SQLite mailing list
> >Subject: Re: [sqlite] SQLite for datalogging - best practices
> >
> >On Tue, Oct 30, 2018 at 4:58 AM Keith Medcalf <[hidden email]>
> >wrote:
> >
> >>
> >> If you don't mind me asking, what sort of data are you collecting?
> >> Are you the master (ie, scanning) or a slave (getting async data
> >pushed to
> >> you).
> >> Are you "compressing" the returned data (storing only changes
> >exceeding
> >> the deadband) or are you storing every value (or is the source
> >instrument
> >> doing compression)?
> >>
> >I presume you need to store the TimeStamp, Point, Value and
> >Confidence.
> >> What is the data rate (# Points and Frequency)
> >>
> >
> >The bulk of data consists of streams of AC signals being pushed from
> >a
> >handful of 3-axis accelerometers which are more or less synchronous.
> >Data rate is in the order of a few hundreds samples/sec for each
> >sensor.
> >A first software layer handles buffering and passes one-second
> >buffers to a
> >second software layer which then saves it to the database for later
> >analysis.
> >Database schema currently consists of a single table with roughly the
> >following columns: timestamp, sensorid (string), datatype (string)
> >and a
> >string containing the JSON encoding of those few hundred samples (as
> >a JSON
> >array).
> >So each row takes up about 3-4KBs (~200 samples * ~5 bytes/samples *
> >3 axes
> >+ overhead).
> >At a later stage one may want to pack together adjacent chunks into
> >even
> >longer strings (so to reduce the total number of rows) and/or store
> >data in
> >a more efficient manner (e.g. in binary or compressed form).
> >I don't particularly like this way of storing logical streams (i.e.
> >Time
> >Series) in chunks but I could find any better way.
> >
> >There's also some way-less-frequent readings (scalar quantities being
> >collected every 30 seconds or so) currently being stored in the same
> >table.
> >
> >Any suggestion on how to improve this?
> >_______________________________________________
> >sqlite-users mailing list
> >[hidden email]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf

On Wednesday, 31 October, 2018 13:22, Gerlando Falauto <[hidden email]> wrote:

>first of all let me tell you I cannot thank you enough for going
>through all this trouble for me!

No problem.  I still really do not know if the answer is correct however it does let you get data about how much data is actually being moved around.

>It did not occur to me it would help to run checkpoints manually.
>There's a couple of things I don't understand though.

>- After a checkpoint runs to completion (assuming no other process is
>reading nor writing), what is the value of walpages (i.e. *pnLog)? Is
>it the number of pages *originally* in the logfile (i.e. the same as
>copypages), or is it the number of pages *remaining* in the logfile
>(i.e. zero)? Also, does it refer to the number of *valid* pages or the
>number of pages allocated by the file (i.e. fllesize divided by pagesize)?

The table generated by the passive checkpoints looks like this:

sqlite> select * from tracer order by timestamp desc limit 5;
timestamp   status      walpages    copypages
----------  ----------  ----------  ----------
1541032800  0           1462        1462
1541032740  0           1457        1457
1541032680  0           1452        1452
1541032620  0           1454        1454
1541032560  0           1449        1449

I presume that the walpages means the number of frames (pages) in the WAL file BEFORE the checkpoint and that copypages is the number of those frames copied to the database (the difference being the number left in the wal file).  If not all pages can be copied (for example a read transaction is blocking them) then I would expect that eventually they will be copied.  Therefore the copypages number is really the only useful one (all those pages must have been written to the WAL and must eventually be copied to the main database) and the fact that some of the pages could not be copied at some particular instant is not really useful for determining the actual total amount of data moved.  Whatever is in copypages had to be written to the wal and has to be written to the db (written twice, erasing the equivalent number of pages).  When exactly that happens is not important, I don't think.

>- How often are you running checkpoints? Do you expect this variable
>to have an impact on the end result?

Currently I let it run once per minute with the following set for the database right after it is opened:

pragma cache_size=262144;            # 1 Gigabyte
pragma journal_mode=wal;             # Set WAL Mode
pragma wal_autocheckpoint=0;         # Disable Autocheckpoint
pragma journal_size_limit=8388608;   # Truncate the WAL on checkpoint if possible to 8 MB
pragma synchronous=NORMAL;           # Only force sync on checkpoint not each commit

I don't think that the frequency of checkpointing will have much of a total effect on the overall rate of change, however, it does affect the size of each checkpoint and the amount of data that could be lost (the amount between checkpoints) on system failure.  

>Anyway, I'll definitely try the same approach on the current schema
>(which
>is *way* different than your super-optimized version) just to see if
>it is
>more or less consistent with the trend reported by smartmontools. If
>that's
>the case, that makes for a much more convenient metric to estimate
>performance and compare implementations.

I just have an asynchronous generator that generates the incoming data in a queue, and I simply sit waiting on the queue for stuff to process, and log the checkpoint statistics into the same database.  That is, I wait up to 250 ms for something to arrive and if it does not then "commit" if a transaction is open (which results in a commit once per second) and then I check if the unix epoch time is a multiple of 60 (time % 60 == 0) and if so do a passive wal_checkpoint and record the results then block waiting for data.  If I did get data from one or the other waits I open a transaction if necessary and insert the data.  Repeat forever.  The checkpoint log is in the same database and takes up a few bytes, but you then always have data available to see how many database pages are being moved around.  The tables are "trimmed" (delete old data) each time a transaction is opened.

Also, the page size of the database is 4K and each row is 4K or a bit more.  I wonder if either compressing the data somehow or increasing the database page size to ensure each row fits on a page may make any significant difference.

>So really, thanks a lot!

No problem.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12