SQLite for datalogging - best practices

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

SQLite for datalogging - best practices

Gerlando Falauto
Hi everyone,

as I mentioned a few months ago, I'm using SQLite to continuously log data
collected from several sources on a linux system:

This is the current approach:
- There is just one single-threaded writer process, which also periodically
issues DELETE statements to remove older data.
- In order to prevent long-running reading queries from blocking the
writer, I'm using WAL mode.
- The database is opened with SYNCHRONOUS=1 to prevent database corruption.
- Not doing any manual checkpoints, just limiting journal size to 100MB.
- Page size is at its default
- Underlying storage is a F2FS partition on a commodity 32GB SSD.

The idea would be to end up with a ~20GB database with about 10 days worth
of rolling data.

There are two (apparently) opposing requirements:
- in case of a power outage, all data collected up to at most N seconds
prior to the power failure should be readable. Ideally N=0, but what's
important is that the database never gets corrupted.
- the SSD's wear due to continuous writes should be reduced to a minimum

Of course there's no silver bullet and some sort of compromise must be
accepted.
However, it's not clear to me how to control or even understand the overall
behavior of the current system -- in terms of potential data loss and SSD
wearout rate -- apart from empirical testing.

There's just too many layers which will affect the end result:
- how the application interacts with SQLite (pragmas, statements,
transactions, explicit checkpoints, etc...)
- how SQLite interacts with the OS (write(), sync()...)
- how the F2FS filesystem interacts with the SSD (block writes, TRIM...)
- how the SSD controller interacts with the underlying flash chips
(buffering, erases, writes, wear leveling...)

Any suggestion on how to proceed, where to start?
What should I be assuming as "already-taken-care-of" and what should I
rather concentrate on?
Final database size? Commit/checkpoint frequency? Page size?

For instance, one crazy idea might be to put the WAL file on a ramdisk
instead (if at all possible) and manually run checkpoints at periodic
intervals. But that would make no sense if I knew for a fact that the OS
will never actually write to disk while the WAL file is open, until a
sync() occurs.

I found the "Total_Erase_Count" reported by smartmontools to be an
interesting end-to-end metric in the long run and I believe its growth rate
is what needs to be minimized in the end.
But I have no idea what the theoretical minimum would be (as a function of
data rate and acceptable amount of data loss), and if there's a more
immediate way of measuring SSD wear rate.

I could start fiddling with everything that comes into mind and measure the
end result, but
I believe this should be a well-known problem and I'm just missing the
basic assumptions here...

Thank you!
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Simon Slavin-3
On 28 Oct 2018, at 2:06pm, Gerlando Falauto <[hidden email]> wrote:

> There are two (apparently) opposing requirements:
> - in case of a power outage, all data collected up to at most N seconds
> prior to the power failure should be readable. Ideally N=0, but what's
> important is that the database never gets corrupted.

Unless you really know what you're doing, leave SQLite settings and PRAGMAs at their defaults.  This gives you the safest, least corruptable, settings.  WAL mode is fine.

> - the SSD's wear due to continuous writes should be reduced to a minimum

I assume your App generates items for the log one by one.  There's not much you can do to reduce wear if you want each one logged as soon as possible.

In terms of deleting rows, if you want the least wear on your SSD, do the deletes seldomly.  Definitely no more than once a day.

> There's just too many layers which will affect the end result:

Correct.  And remember that pretty much every hardware driver lies to the OS to make the hardware seem faster.

> For instance, one crazy idea might be to put the WAL file on a ramdisk

I understand why you're raise that but, as you suspect, it's not going to help.

> I found the "Total_Erase_Count" reported by smartmontools to be an
> interesting end-to-end metric in the long run

If you care about that level of detail, using SQLite is the wrong approach.  You should be logging by appending each entry to a text file.  Or maybe to a new textfile every day/week/month.  Then, to prune the log just delete one old textfile each time you start a new one.  Convert a text file to a database only if you want to query the log, which is presumably a rare operation because you're such a good programmer/admin that stuff rarely goes wrong.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Gerlando Falauto

This could be pretty complicated and depends a lot on the manufacturer of the SSD.  The first thing to be said is that the most accurate portrayal of the life of the device is probably found by what the manufacturer is willing to warranty and in most cases the warranty will be very conservative.  On a traditional "spinning disk", the warranty usually reflects the longevity design of the moving parts (motors, actuators, bearings) and not so much the actual "data preservation" period of the magnetic media itself -- once you get past the mechanical parts, the lifetime of the data stored on the media itself is typically not relevant (ie, is much longer than the physical/mechanical lifetime).  Special considerations apply to devices used for non-online storage (for example backup media written to infrequently and mostly kept offline in a vault) where the mechanical lifetime is mostly irrelevant (or, more accurately, is determined by cycle counts and shelf-lifetime) and the lifetime of the stored data is the limiting factor (plus, as a real consideration, whether or not an entire computer system capable of reading the device is archived as well since that is quite often the factor most limiting access lifetime).

Not so with entirely solid state devices for which the mechanical is largely irrelevant (though mechanical failure due to induced stress is often a failure mode, it largely cannot be predicted accurately).

For both mechanical (spinning) and solid state devices (of all types including Flash, RAM, CPUs, etc) the best way to deal with mechanical stress is to ensure:
 (1) They are powered up once and only once
 (2) They are permitted to reach normal and stable operating temperature before being used
 (2) They remain powered on and at that stable operating temperature until death

In all cases the "Hardware Destroyer" modes (colloquially called "Power Saving" modes) place the most stress on the devices.  Each thermal cycle due to a power cycle (or Power Saving) reduces the mechanical lifespan by half.  Operating outside the designed temperature range also reduces the mechanical lifespan.  For both "spinning" and solid-state devices you want to minimize the thermal changes and always operate at the designed operating temperature with the minimum fluctuations possible in order to maximize the physical mechanical lifetime.  For most devices the mechanical stress due to thermal cycling are already taken into account since they have such a great effect of expected serviceable lifetime.  This is why most "consumer" spinning disks have only 90 day warranties -- they are known to operate in "Hardware Destroyer" configurations and usages and that is the limiting factor for their lifetime.  In "continuous service" the useful life usually follows the MTBF which is a calculation based on "continuous service".

For FLASH / SSD / NVRAM type devices the limiting lifetime factor (once the above mechanical issues are addressed) is the breakdown of the function of the tunnel oxide layer by the retention of electrons leading to a failure to "erase" the cell (thus the bit becomes "stuck").  Various methods are used by different manufacturers to reduce or eliminate this effect, but basically it is a limitation on how many times a cell can be "erased" successfully.  Generally speaking the number of erase operations per cell has increased into the 100,000 or millions of erase cycles per cell.  

Once one cell becomes "stuck" and cannot be erased, the entire containing block must be removed from service (although in some devices each block has ECC codes attached which can correct for some "stuck" bits, the need to use the ECC hardware indicates impending peril of the block -- just as the BER on a spinning disk is a good indicator of the deterioration of the oxide layer -- though it is quite confounded lately by the use of spinning disk technology which is inherently unreliable and depends on error correction for normal operation).  

This size of this block is determined by the manufacturer and has a profound effect on the "speed" of the device.  Generally speaking "slow" devices (NVRAM, "Flash Drives", etc will use small block sizes whereas SSD devices will use rather large block sizes).  (example -- SmartCard Flash may use a single line of 8 bytes (64-bits) as the block size compared to some SSDs which use block sizes up to about 2 MB (16777216 bits)).  As the probability of an erase failure is evenly distributed (one would hope, barring manufacturing defects) through the entire block (since the entire block is erased as a unit) the failure is at the erased block level.  The "storage controller" of most SSD type devices continuously "re-allocates" blocks and copies data internally from one block to another trying to ensure that the "erase failure" probability is even across all blocks of a device -- this is called "wear levelling" -- periodically moving data from blocks that are not frequently changed to blocks that are more frequently changed so that the low erase count blocks can be erased and re-used (thus evening out the erasures evenly across all blocks).  Additional compensation for "stuck" bits in a block is usually provided by having an excess of blocks (spare blocks) and ECC hardware so that failing blocks can be removed from service and replaced with a "spare" (similar technology is used in spinning disks to compensate for manufacturing defects and degradation of the oxide layer on the platters -- though on spinning disks where these "spares" are located and how they are used has a very great impact on performance).  Unlike spinning disks though, when the pool of "spare blocks" on an SSD becomes exhausted the device effectively becomes useless.

The next factor is the internal write multiplication factor.  Lets say you have a device which is divided into 2 MB blocks.  And you update 1 sector (512 bytes) somewhere in this block.  The device must (a) read out the entire 2MB block (b) update the data within the block then (c) re-write a new 2 MB block to replace the old.  This places a heavy erase-reuse of blocks containing things that are updated frequently (like directory blocks).  In many cases the storage controller will compensate for this fact by allowing blocks to have "holes" and having the "hole" written as a component piece of a block that also contains other changed blocks and the block containing the "hole" is only recovered by background wear leveling.  This means that there needs to be a table mapping "logical blocks" to "physical blocks" and that must be stored somewhere as well, and is even more volatile than the data blocks themselves (so it is usually stored in RAM and only periodically backed up to the persistent flash storage).  Additionally, data may be deleted without being overwritten and this is optimized by use of TRIM which allows the storage controller to be aware of what data is does NOT need to copy.

All this is generally why SSDs are rated with "bytes written" as the limiting factor and *not* erasure count.  Not because the "bytes written" is of any direct significance, but because it is the best raw measure of the lifetime of the device which is based heavily on the internal management implemented by the storage controller.  Assuming that you (a) manage your physical thermal stress, (b) use the device with the access patterns for which it was designed; and, (c) utilize TRIM, you will likely find that the "total bytes written" is the most accurate portrayal of SSD lifetime.  Whether SSD has a "shelf-life" I do not know (mechanical disk does -- for example the lubricant used in the moving parts will degrade at a fixed rate -- and perhaps faster if not used).  For spinning disks assuming that you have managed physical stresses and disabled the "Hardware Destroyer" the MTBF is the most reliable indicator of lifetime.

So, for my NVMe SSD which is now 1 year old, and which has a rated lifetime of 1,200 TBW, and currently the counters say "24 TB written" can, assuming that future access patterns are the same as those over the past year, will last 50 years (which is 10 times the warranty period).  For the SATA SSD the calculation is ever longer lifetime, exceeding the warrantied lifetime by 100 times (estimate is over 1000 years).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Gerlando Falauto
>Sent: Sunday, 28 October, 2018 08:06
>To: SQLite mailing list
>Subject: [sqlite] SQLite for datalogging - best practices
>
>Hi everyone,
>
>as I mentioned a few months ago, I'm using SQLite to continuously log
>data
>collected from several sources on a linux system:
>
>This is the current approach:
>- There is just one single-threaded writer process, which also
>periodically
>issues DELETE statements to remove older data.
>- In order to prevent long-running reading queries from blocking the
>writer, I'm using WAL mode.
>- The database is opened with SYNCHRONOUS=1 to prevent database
>corruption.
>- Not doing any manual checkpoints, just limiting journal size to
>100MB.
>- Page size is at its default
>- Underlying storage is a F2FS partition on a commodity 32GB SSD.
>
>The idea would be to end up with a ~20GB database with about 10 days
>worth
>of rolling data.
>
>There are two (apparently) opposing requirements:
>- in case of a power outage, all data collected up to at most N
>seconds
>prior to the power failure should be readable. Ideally N=0, but
>what's
>important is that the database never gets corrupted.
>- the SSD's wear due to continuous writes should be reduced to a
>minimum
>
>Of course there's no silver bullet and some sort of compromise must
>be
>accepted.
>However, it's not clear to me how to control or even understand the
>overall
>behavior of the current system -- in terms of potential data loss and
>SSD
>wearout rate -- apart from empirical testing.
>
>There's just too many layers which will affect the end result:
>- how the application interacts with SQLite (pragmas, statements,
>transactions, explicit checkpoints, etc...)
>- how SQLite interacts with the OS (write(), sync()...)
>- how the F2FS filesystem interacts with the SSD (block writes,
>TRIM...)
>- how the SSD controller interacts with the underlying flash chips
>(buffering, erases, writes, wear leveling...)
>
>Any suggestion on how to proceed, where to start?
>What should I be assuming as "already-taken-care-of" and what should
>I
>rather concentrate on?
>Final database size? Commit/checkpoint frequency? Page size?
>
>For instance, one crazy idea might be to put the WAL file on a
>ramdisk
>instead (if at all possible) and manually run checkpoints at periodic
>intervals. But that would make no sense if I knew for a fact that the
>OS
>will never actually write to disk while the WAL file is open, until a
>sync() occurs.
>
>I found the "Total_Erase_Count" reported by smartmontools to be an
>interesting end-to-end metric in the long run and I believe its
>growth rate
>is what needs to be minimized in the end.
>But I have no idea what the theoretical minimum would be (as a
>function of
>data rate and acceptable amount of data loss), and if there's a more
>immediate way of measuring SSD wear rate.
>
>I could start fiddling with everything that comes into mind and
>measure the
>end result, but
>I believe this should be a well-known problem and I'm just missing
>the
>basic assumptions here...
>
>Thank you!
>Gerlando
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Petite Abeille-2


> On Oct 28, 2018, at 11:32 PM, Keith Medcalf <[hidden email]> wrote:
>
> will last 50 years (which is 10 times the warranty period)

Thank you.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Simon Slavin-3
On Sun, Oct 28, 2018 at 6:18 PM Simon Slavin <[hidden email]> wrote:

> On 28 Oct 2018, at 2:06pm, Gerlando Falauto <[hidden email]>
> wrote:
>
> > There are two (apparently) opposing requirements:
> > - in case of a power outage, all data collected up to at most N seconds
> > prior to the power failure should be readable. Ideally N=0, but what's
> > important is that the database never gets corrupted.
>
> Unless you really know what you're doing, leave SQLite settings and
> PRAGMAs at their defaults.  This gives you the safest, least corruptable,
> settings.  WAL mode is fine.
>
> That was also my guess.


> > - the SSD's wear due to continuous writes should be reduced to a minimum
>
> I assume your App generates items for the log one by one.


Yes, items are indeed generated one by one. However, since they're coming
from several (about a dozen) more-or-less-synchronized sources, they would
generally come in bursts.


> There's not much you can do to reduce wear if you want each one logged as
> soon as possible.
>

I guess I'm looking for the best trade-off, but I don't really know how
things really work under the hood so I'd like to avoid making totally wrong
assumptions or stupid mistakes.
So for instance, on a bare NOR flash, I believe I might assume I can safely
write at any granularity (even a single bit) as long as I change 1's to 0's.
On an SSD, I have no clue what's going on -- will writing a 2 MB block one
page at a time be much worse than writing an entire block at once?


> In terms of deleting rows, if you want the least wear on your SSD, do the
> deletes seldomly.  Definitely no more than once a day.
>

Hmmm, I don't understand, could you please elaborate on that?
I first tried this kind of approach (DELETE once a day) for simplicity, but
deleting big amounts of rows at once seemed to potentially take a
relatively long time and generate a very big WAL file.
So my uneducated guess was then that I'd be better served by making room
(DELETE) just before adding new data (INSERT) so to reuse just-deleted
pages.

> There's just too many layers which will affect the end result:
>
> Correct.  And remember that pretty much every hardware driver lies to the
> OS to make the hardware seem faster.
>

Agreed.

> For instance, one crazy idea might be to put the WAL file on a ramdisk
>
> I understand why you're raise that but, as you suspect, it's not going to
> help.
>

Reason for that being? The WAL file won't get sync()ed?

> I found the "Total_Erase_Count" reported by smartmontools to be an
> > interesting end-to-end metric in the long run
>
> If you care about that level of detail, using SQLite is the wrong
> approach.
>
You should be logging by appending each entry to a text file.  Or maybe to
> a new textfile every day/week/month.  Then, to prune the log just delete
> one old textfile each time you start a new one.


I was afraid I would get that kind of answer at some point. It's not that I
do (or don't) care about that level of detail.
It's just that I'd like to get a rough idea of how much worse it is to use
SQLite compared to this approach, in terms of SSD wearout.
If it's a 2x factor, I'm probably quite OK with that. If it's a 100x
factor, then I have a problem.

If there was a simple way to split the database into say 10 different files
(which could be queried against as a single database though), and just
rotate among those, I'd be very happy about it. Is there? Using ATTACHed
databases and carefully-crafted UNION statements perhaps?


> Convert a text file to a database only if you want to query the log, which
> is presumably a rare operation because you're such a good programmer/admin
> that stuff rarely goes wrong.
>

Good one. :-)
Kidding aside, collected data are measurements from external devices whose
reading back might be required at any point in time for e.g. analytics
purposes.
I still don't know when or with what pattern -- it just seemed quite
convenient and efficient to have a SQL database (with its indexes) to query
it.

Thanks a lot for your help!
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Petite Abeille-2

On Sunday, 28 October, 2018 16:42, Petite Abeille <[hidden email]> wrote:

>> On Oct 28, 2018, at 11:32 PM, Keith Medcalf <[hidden email]> wrote:

>> will last 50 years (which is 10 times the warranty period)

>Thank you.

No problems.  What I really mean of course is not that I "expect" the SSD to last 50 years (when the warranty is only 5 years) but that I would expect that it will last at least the 5 years that were warrantied (since the warranty is for "bytes written" or "years", whichever is less).  

Over the past 40 years I have found that most devices if treated properly seem to last practically forever or quick sudden death (the bathtub curve).  It has almost entirely been my experience that "forever" has limitations other than continuing to work in accordance with the manufacturers specifications.  I have generally found that the specifications become insufficient compared to what is currently on the market at a much faster rate than actual failure.  In other words the thing becomes useless due to excessively small size, slow speed, ancient technology, whatever, long before actual failure.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Keith Medcalf
Hi Keith,

thanks for the heads up on thermal stress.

The next factor is the internal write multiplication factor.  Lets say you
> have a device which is divided into 2 MB blocks.  And you update 1 sector
> (512 bytes) somewhere in this block.  The device must (a) read out the
> entire 2MB block (b) update the data within the block then (c) re-write a
> new 2 MB block to replace the old.


That I don't get. Are you sure about that? My understanding from what I've
been reading was that the technology behind SSDs would force you to *erase*
2 MB blocks but also allow you to *write* e.g. 4KB pages.
In other words, I was expecting the SSD controller and/or the filesystem to
be smart enough to cleverly allocate and move pages around within the
available blocks.
So if a 2MB-block is made of 512 4KB-pages, just overwriting the same 4KB
page 512 times will only cause one block erasure (or something in that
order of magnitude), not 512.  If that is correct, my conclusion would be
that you should always write in multiples of the page size (e.g. 4KB),
assuming you somehow get to know that value.
Perhaps you're actually saying the same thing in the following paragraph?


> This places a heavy erase-reuse of blocks containing things that are
> updated frequently (like directory blocks).  In many cases the storage
> controller will compensate for this fact by allowing blocks to have "holes"
> and having the "hole" written as a component piece of a block that also
> contains other changed blocks and the block containing the "hole" is only
> recovered by background wear leveling.  This means that there needs to be a
> table mapping "logical blocks" to "physical blocks" and that must be stored
> somewhere as well, and is even more volatile than the data blocks
> themselves (so it is usually stored in RAM and only periodically backed up
> to the persistent flash storage).  Additionally, data may be deleted
> without being overwritten and this is optimized by use of TRIM which allows
> the storage controller to be aware of what data is does NOT need to copy.
>



Thank you!
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Simon Slavin-3
In reply to this post by Gerlando Falauto
On 28 Oct 2018, at 11:47pm, Gerlando Falauto <[hidden email]> wrote:

> On an SSD, I have no clue what's going on -- will writing a 2 MB block one page at a time be much worse than writing an entire block at once?

You do indeed have no clue.  And so do I unless I use a utility to look at the low-level formatting of the SSD.  Which involves you doing things at the wrong level which, as I point out later, is not going to matter in the long run.

Please also remember that an SSD can write a new block even if you change just one byte in the middle of an existing file.  That's what wear-levelling can mean.  

>> In terms of deleting rows, if you want the least wear on your SSD, do the deletes seldomly.  Definitely no more than once a day.
>
> Hmmm, I don't understand, could you please elaborate on that?
> I first tried this kind of approach (DELETE once a day) for simplicity, but deleting big amounts of rows at once seemed to potentially take a relatively long time and generate a very big WAL file.
> So my uneducated guess was then that I'd be better served by making room (DELETE) just before adding new data (INSERT) so to reuse just deleted pages.

With an SSD, if you're minimising wear you want to do the minimum number of operations.  SQLite is good at doing the minimum number of write operations to take care of lots of changes at once.  Deleting a hundred rows individually may require five hundred write operations.  Deleting a hundred contiguous rows in one command may require just fifty.  Or even twenty.

Of course, the requirement to reduce write operations may be in conflict with other requirements: lack of a long lock-out time, or minimised size of WAL file.

>> For instance, one crazy idea might be to put the WAL file on a ramdisk
>>
>> I understand why you're raise that but, as you suspect, it's not going to
>> help.
>
> Reason for that being? The WAL file won't get sync()ed?

The WAL file is not the only file which gets written in that pattern.  So does the database file itself.  The additional non-standard weirdness of putting the WAL file somewhere special isn't going to make a huge difference.

>> You should be logging by appending each entry to a text file.  Or maybe to
>> a new textfile every day/week/month.  Then, to prune the log just delete
>> one old textfile each time you start a new one.
>
> [...] I'd like to get a rough idea of how much worse it is to use
> SQLite compared to this approach, in terms of SSD wearout.
> If it's a 2x factor, I'm probably quite OK with that. If it's a 100x
> factor, then I have a problem.

Writing a log entry to a text file is incredibly efficient.  It's the number of octets in the log entry plus, for the sake of argument, one extra octet for a return character.  At most, you're writing to two blocks, plus assorted file meta-information.

Writing a field with the same number of octets to a SQLite database, in its own transaction, involves (not in this order) writing it to the journal file, writing it to the database, adding an entry to all indexes, writing row-length data, changing table-length and index-length data, changing the database header twice, and probably a bunch of things I've forgotten.  If your log database gets long you might write four times the amount of octets, to six different locations on disk.  For each log entry.

I don't really know enough about SQLite's ACID writing pattern to give accurate numbers but that's a reasonable horseback guess.  Maybe someone who does will improve it.

My take-away from thinking about this is again what I wrote before: write each log to a text file in a single atomic operation.  That way you know exactly what you did, and you perform one operation per log entry.  And deleting one log file per day/week/month is a single DELETE & TRIM to get rid of numerous log entries.

By all means, when/if it comes time to analyse the log entries, turn them into the so-useful database on whatever computer you use to do your analysis, meaning you aren't worried about performing lots of operations on your production SSD.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Gerlando Falauto

On Sunday, 28 October, 2018 17:48, Gerlando Falauto <[hidden email]> wrote:
>On Sun, Oct 28, 2018 at 6:18 PM Simon Slavin <[hidden email]> wrote:
>> On 28 Oct 2018, at 2:06pm, Gerlando Falauto <[hidden email]> wrote:

>>> - the SSD's wear due to continuous writes should be reduced to a minimum

>> I assume your App generates items for the log one by one.

>Yes, items are indeed generated one by one. However, since they're
>coming from several (about a dozen) more-or-less-synchronized sources, they
>would generally come in bursts.

>> There's not much you can do to reduce wear if you want each one
>logged as soon as possible.

>I guess I'm looking for the best trade-off, but I don't really know
>how things really work under the hood so I'd like to avoid making totally
>wrong assumptions or stupid mistakes.

>So for instance, on a bare NOR flash, I believe I might assume I can
>safely write at any granularity (even a single bit) as long as I change 1's
>to 0's.

>On an SSD, I have no clue what's going on -- will writing a 2 MB
>block one page at a time be much worse than writing an entire block at once?

Perhaps, but that depends on the particular SSD memory controller, some are much better than others.  The "crappy" ones will re-write the 2MB block over and over again each time you update some data within the block (so for example 4096 sequential 512 byte writes will cause the underlying 2 MB storage block to be erased and overwritten (or rather re-allocated) 4096 times.  Better controllers will cache the updates in simpler storage and only push the block (2MB) down into flash when it is full (or the cache space is full).  

Generally speaking what happens is directly related to price -- you get what you pay for.  If the device costs $25 it is likely operating in the former mode and will die quickly.  If the device cost $2500 then it is likely operating in the latter mode (and probably with even more optimizations to when it actually pushes down to the flash blocks) and will last much longer.  If well designed you will probably find that the $2500 device lasts 100 times longer than the $25 device, and probably much more than 100 times longer.  So it depends whether you want to replace the $25 device every day or the $2500 device every quarter.  

You still end up paying out the same for the physical devices.  It depends entirely on what your "time" and "effort" is worth for each replacement and the relative risk (and consequence) of that activity.  

Personally I would buy a $25000 dollar device (assuming that the lifetime enhancement is linear) and leave the replacement as an exercise for whomever replaced me after I retired :)  Then again I worked in a field where "if it ain't broke don't fix it" and "lets pay $X for something that will work flawlessly for Y years without having to be touched, and then we will spend $X again to replace the whole kit and kaboodle with something else" are entirely valid and preferred risk and consequence management strategies.  In other words, that the cost associated with doing things in this manner is trivial in comparison to the cost of failure -- the cost unit of failures being on the scale of "Yearly Salaries per Hour"; a cost which might even justify buying the $250,000 dollar version of the storage device ...

>> In terms of deleting rows, if you want the least wear on your SSD,
>> do the deletes seldomly.  Definitely no more than once a day.

> Hmmm, I don't understand, could you please elaborate on that?
> I first tried this kind of approach (DELETE once a day) for
> simplicity, but deleting big amounts of rows at once seemed to
> potentially take a relatively long time and generate a very big
> WAL file.
> So my uneducated guess was then that I'd be better served by making
> room (DELETE) just before adding new data (INSERT) so to reuse just-
> deleted pages.

I don't think this follows (that deleting less frequently reduces "wear" on the SSD).  If you are overwriting X blocks it really makes no difference whether you overwrite those X blocks at an even rate spread over 24 hours or "all at once", you are still modifying (erasing) the same amount of blocks.  However, I would generally prefer the "check and delete" every time before inserting because it has better performance (as you said) and it allows that storage controller of the SSD device may be able to more efficiently coalesce the updates and manage the underlying storage.

>> I found the "Total_Erase_Count" reported by smartmontools to be an
>> > interesting end-to-end metric in the long run

>> If you care about that level of detail, using SQLite is the wrong
>> approach.

>> You should be logging by appending each entry to a text file.  Or
>> maybe to a new textfile every day/week/month.  Then, to prune the
>> log just delete one old textfile each time you start a new one.

>I was afraid I would get that kind of answer at some point. It's not
>that I do (or don't) care about that level of detail.
>It's just that I'd like to get a rough idea of how much worse it is
>to use SQLite compared to this approach, in terms of SSD wearout.

>If it's a 2x factor, I'm probably quite OK with that. If it's a 100x
>factor, then I have a problem.

Again, this mostly is dependent on the implementation of the storage controller of the flash/ssd storage device.  The "factor" is determined by how much you pay for the hardware and the hardware controller implementation.  Simply appending a "row" to a text file will require overwriting the entire block containing the changed data so there is not really much difference in where you "insert" that new row, so there is not likely to be much difference in the access pattern of the underlying storage blocks in either case, simply whether or not the storage controller on the device can handle that kind of updating.

For example, the controllers on so-called Thumb drives are optimized for sequentially writing a big hunk-o-data in a single go -- something like copying a movie to a flash drive so you can copy it to another device -- and are rather dismal at random access.  They may last many years of copying Petabytes of data in this manner (coping entire big files).  However, if used for "random access" (which includes appending small amounts of data to existing blocks) you might find that they fail in a few days with only a few Gigabytes of data written.

>If there was a simple way to split the database into say 10 different
>files (which could be queried against as a single database though), and
>just rotate among those, I'd be very happy about it. Is there? Using
>ATTACHed databases and carefully-crafted UNION statements perhaps?

This is doable and there is even an extension to do that.  However, it is not likely to provide any advantage at the storage device level and suffers from the inherent unreliability introduced by adding complexity.  It might make a $2 dollar flash drive last a month rather than a day, but it is unlikely to have any effect on "designed for purpose" storage devices -- and the more you paid for that storage device, the less of an effect it is likely to have.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Gerlando Falauto

>>The next factor is the internal write multiplication factor.  Lets
>>say you have a device which is divided into 2 MB blocks.  And you update 1
>>sector (512 bytes) somewhere in this block.  The device must (a) read out
>>the entire 2MB block (b) update the data within the block then (c) re-
>>write a new 2 MB block to replace the old.

>>That I don't get. Are you sure about that? My understanding from what
>>I've been reading was that the technology behind SSDs would force you to
>>*erase* 2 MB blocks but also allow you to *write* e.g. 4KB pages.

This is true, more or less.  As you probably know NV storage (no matter what type, be it flash, nvram, prom, eprom, eeprom, whatever) all basically works the same.  You can "erase" it which results in all the bits being the same off state -- lets call it a 0 (though it is possible for the logic states to be reversed where "off" or clear is 1 and on or programmed is 0).  You can then "program" it by switching the state of some of the bits from "clear" to "programmed" (0 -> 1).  You cannot, however, ever return a "programmed" (1) bit back to "cleared" (0) state, except by erasing the whole block.  Depending on the particular device the "erase" size may be the same as the "program" size or it may be bigger up to the entire device -- UV eraseable PROM is an example of this where you can only "erase" the entire device as a whole, there are others.

So yes, there are in fact TWO block sizes, the ERASE block size and the PROGRAM block size.  The ERASE size is often bigger than the PROGRAM size.  A PROGRAM operation programs an entire PROGRAM block, and an ERASE erases and entire erase block which encompases multiple program blocks.  For most SSD/Flash storage devices the size of the PROGRAM block is the minimum I/O block size, and is usually 4 KB or so.  The ERASE size may be much bigger (and it usually is) at say 2 MB.

For simple management processing there is a total number of "program blocks" on the device, addressed by their "physical block number".  Each physical block resides within a "erase block" which is usually larger than the "program block" size and the "erase block" number can be derived from the "physical block number" (usually by a simple binary shift).  For storage management at the most basic level, the hardware storage controller maintains mapping between the "Logical Block Number" and "Physical Block Number", and a "Physical Block Allocation Table" containing information about the usage of physical program blocks.  There will also be a list of free physical blocks, and table of some statistics about various block erase/program operations.  At least the mapping (logical->physical) and BAT must be persistent.  The statistics are usually also persistent.  The lists and any other tables are only needed during "operation" and are usually rebuilt entirely in device RAM when the device is powered on, their contents being derived from the persistent data.

All access is by "Logical Block Number" (which may reside at any "physical block number").  

There are basically three operations that take place on Logical Blocks:  Read, Write, Delete

Read simply translated the logical->physical, reads the physical block, and returns it to the "requestor".

Write will mark the current physical block that holds the logical block as "deleted", find a free physical block to write the data to (and write it), then update the logical->physical mapping table to map the logical block to its physical location.

Delete is the same as write except that there is no writing of a physical data block, and the logical block is marked as "unallocated" in the logical->physical mapping table, and the actual physical data block is marked as "deleted" in the BAT.

This process depends on there being a "pool" of "ready to program" physical storage blocks.  This is managed by a separate process running at the hardware level.  If the free pool is depleted then the equivalent of an interrupt to the pool management process must be generated to get the pool manager to put some blocks on the free list and the process of writing has to wait until there is a block in the free pool which can be used.  Sometimes the BAT updates will generate an interrupt to the block management process (for example, all physical blocks in an erase block are now "deleted" so the entire erase block can be erased and all the physical blocks it contains put on the free block list).

The high level TRIM operation is really nothing more than "delete" against a logical block.

>In other words, I was expecting the SSD controller and/or the
>filesystem to be smart enough to cleverly allocate and move pages around within the
>available blocks.
>So if a 2MB-block is made of 512 4KB-pages, just overwriting the same
>4KB page 512 times will only cause one block erasure (or something in
>that order of magnitude), not 512.  If that is correct, my conclusion
>would be >that you should always write in multiples of the page size (e.g.
>4KB), assuming you somehow get to know that value.
>Perhaps you're actually saying the same thing in the following
>paragraph?

More or less.  Basically the I/O size presented by the OS driver is usually equal to the program block size (but does not have to be).  If it is not, then "data editing" is carried out on the device in RAM the same as it is for spinning disks (retrieve the block, edit the data, write the new block, done at the hardware level).

The efficiency of writing to SSDs and minimizing "erase" operations is dependent on having a pool of blocks available on the free list.  There is no need to ever "erase" unless this pool is depleted however, the background manager does this anyway to manage the layout of blocks, coalesce free blocks, and try to optimize the ordering of logical blocks (if you can optimize the layout of the logical blocks in physical blocks you can optimize access speed -- especially since it takes time to "open" a "line" (which is again usually somewhere between an erase and a program block size) for access.

So yes, updating/deleting a logical block will eventually result in an erase operation but the urgency (now, in a minute, next week, etc) depends on the size of the free list.

So really the secret is to have lots of free blocks available at all times and keep the thermal limits in mind.  Once you get close to the edge (either in not having blocks available or pushing the thermal envelope) performance will suffer and the device will degrade faster.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Richard Hipp-3
In reply to this post by Gerlando Falauto
On 10/28/18, Gerlando Falauto <[hidden email]> wrote:

>
> This is the current approach:
> - There is just one single-threaded writer process, which also periodically
> issues DELETE statements to remove older data.
> - In order to prevent long-running reading queries from blocking the
> writer, I'm using WAL mode.
> - The database is opened with SYNCHRONOUS=1 to prevent database corruption.
> - Not doing any manual checkpoints, just limiting journal size to 100MB.
> - Page size is at its default
> - Underlying storage is a F2FS partition on a commodity 32GB SSD.
>

Other comments on this thread have been good.  Thanks, everybody!

For peak performance and efficiency, omit indexes from the log table.
Adding to a log is just appending to a table, which is particularly
fast.  But if there are indexes, entries must be inserted into the
middle of the indexes (so that the all entries end up in index-order)
which requires more I/O.

If possible, combine two or more (as many as practical, subject to
your timing constraints) inserts into a single transaction.

Using the latest Linux kernels and F2FS filesystem modules, and if you
compile SQLite using SQLITE_ENABLE_BATCH_ATOMIC_WRITE, and if you use
DELETE mode rather than WAL mode, then SQLite will try to take
advantage of the batch atomic write capabilities provided by F2FS.
This can be *much* more efficient in terms of performance and I/O and
also reduced wear on the flash chips.  The downsides to this approach
are (1) there is an exclusive lock held for a few milliseconds while
doing each write, so readers and writers cannot overlap as they can in
WAL mode and (2) there are scattered reports of instabilities with
F2FS following power failures in the middle of transactions - it is
unclear if these reports are true or not and they are rare in any
event, but F2FS is relatively new technology (compared to EXT4) so
there *might* still be some obscure problems on optimized code paths.

Having the WAL file on a ramdisk could result in database corruption
if you take a power failure in the middle of a checkpoint.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Simon Slavin-3
Hi Simon,

On Mon, Oct 29, 2018 at 1:30 AM Simon Slavin <[hidden email]> wrote:

> On 28 Oct 2018, at 11:47pm, Gerlando Falauto <[hidden email]>
> wrote:
>
> > On an SSD, I have no clue what's going on -- will writing a 2 MB block
> one page at a time be much worse than writing an entire block at once?
>
> You do indeed have no clue.  And so do I unless I use a utility to look at
> the low-level formatting of the SSD.  Which involves you doing things at
> the wrong level which, as I point out later, is not going to matter in the
> long run.
>
> Please also remember that an SSD can write a new block even if you change
> just one byte in the middle of an existing file.  That's what
> wear-levelling can mean.
>

I see. Though I guess it would make sense to match SQLITE's page size with
the flash page size, wouldn't it?

>> In terms of deleting rows, if you want the least wear on your SSD, do
> the deletes seldomly.  Definitely no more than once a day.
> >
> > Hmmm, I don't understand, could you please elaborate on that?
> > I first tried this kind of approach (DELETE once a day) for simplicity,
> but deleting big amounts of rows at once seemed to potentially take a
> relatively long time and generate a very big WAL file.
> > So my uneducated guess was then that I'd be better served by making room
> (DELETE) just before adding new data (INSERT) so to reuse just deleted
> pages.
>
> With an SSD, if you're minimising wear you want to do the minimum number
> of operations.  SQLite is good at doing the minimum number of write
> operations to take care of lots of changes at once.  Deleting a hundred
> rows individually may require five hundred write operations.  Deleting a
> hundred contiguous rows in one command may require just fifty.  Or even
> twenty.
>
> Of course, the requirement to reduce write operations may be in conflict
> with other requirements: lack of a long lock-out time, or minimised size of
> WAL file.
>

Exactly. And I guess it won't be a trivial task to figure out the best
trade-off.


> >> For instance, one crazy idea might be to put the WAL file on a ramdisk
> >>
> >> I understand why you're raise that but, as you suspect, it's not going
> to
> >> help.
> >
> > Reason for that being? The WAL file won't get sync()ed?
>
> The WAL file is not the only file which gets written in that pattern.  So
> does the database file itself.  The additional non-standard weirdness of
> putting the WAL file somewhere special isn't going to make a huge
> difference.
>

Granted. Though I guess minimizing the number of times it gets flushed
would make sense, wouldn't it? My understanding is that as long as I don't
push SYNCHRONOUS past 1 I should be OK.


> >> You should be logging by appending each entry to a text file.  Or maybe
> to
> >> a new textfile every day/week/month.  Then, to prune the log just delete
> >> one old textfile each time you start a new one.
> >
> > [...] I'd like to get a rough idea of how much worse it is to use
> > SQLite compared to this approach, in terms of SSD wearout.
> > If it's a 2x factor, I'm probably quite OK with that. If it's a 100x
> > factor, then I have a problem.
>
> Writing a log entry to a text file is incredibly efficient.  It's the
> number of octets in the log entry plus, for the sake of argument, one extra
> octet for a return character.  At most, you're writing to two blocks, plus
> assorted file meta-information.
>

Right, plus I might have some control and do some fine-tuning with syncs.



> Writing a field with the same number of octets to a SQLite database, in
> its own transaction, involves (not in this order) writing it to the journal
> file, writing it to the database, adding an entry to all indexes, writing
> row-length data, changing table-length and index-length data, changing the
> database header twice, and probably a bunch of things I've forgotten.  If
> your log database gets long you might write four times the amount of
> octets, to six different locations on disk.  For each log entry.
>
> I don't really know enough about SQLite's ACID writing pattern to give
> accurate numbers but that's a reasonable horseback guess.  Maybe someone
> who does will improve it.
>
> My take-away from thinking about this is again what I wrote before: write
> each log to a text file in a single atomic operation.  That way you know
> exactly what you did, and you perform one operation per log entry.  And
> deleting one log file per day/week/month is a single DELETE & TRIM to get
> rid of numerous log entries.
>

I agree, it makes a lot of sense to do that at least for comparison.

By all means, when/if it comes time to analyse the log entries, turn them
> into the so-useful database on whatever computer you use to do your
> analysis, meaning you aren't worried about performing lots of operations on
> your production SSD.
>

The point is, analysis will have to be performed on the production
(embedded) system anyway.
The whole idea of using a database was to have a fast and convenient way to
perform random access to data (just according to timestamp and data source,
nothing fancy).
The same thing could definitely be done by using text files with some
folder/file partitioning logic, except it requires an extra layer.
I wonder if there's a way to write a backed to peewee (or any other ORM for
that matter) which lets you work with CSV or jsonlines files, the way you
would do with Apache Drill for instance, but without the extra overhead.

Thanks again for your contributions!

Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Keith Medcalf
Hi Keith,

thanks for your feedback and your generous insights.
Bottom line is, it's a cheap device (MLC with 3000 P/E cycles) and there's
nothing I can do about it, I just have to make the best of it.
If there's any way to figure out how "crappy" it is, I'd be interested in
hearing that.

>If there was a simple way to split the database into say 10 different
> >files (which could be queried against as a single database though), and
> >just rotate among those, I'd be very happy about it. Is there? Using
> >ATTACHed databases and carefully-crafted UNION statements perhaps?
>
> This is doable and there is even an extension to do that.  However, it is
> not likely to provide any advantage at the storage device level and suffers
> from the inherent unreliability introduced by adding complexity.  It might
> make a $2 dollar flash drive last a month rather than a day, but it is
> unlikely to have any effect on "designed for purpose" storage devices --
> and the more you paid for that storage device, the less of an effect it is
> likely to have.
>

Agreed. I'll have to dig this further.

Thank you,
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Richard Hipp-3
On Mon, Oct 29, 2018 at 2:28 PM Richard Hipp <[hidden email]> wrote:

> On 10/28/18, Gerlando Falauto <[hidden email]> wrote:
> >
> > This is the current approach:
> > - There is just one single-threaded writer process, which also
> periodically
> > issues DELETE statements to remove older data.
> > - In order to prevent long-running reading queries from blocking the
> > writer, I'm using WAL mode.
> > - The database is opened with SYNCHRONOUS=1 to prevent database
> corruption.
> > - Not doing any manual checkpoints, just limiting journal size to 100MB.
> > - Page size is at its default
> > - Underlying storage is a F2FS partition on a commodity 32GB SSD.
> >
>
> Other comments on this thread have been good.  Thanks, everybody!
>

Thank you (and BTW, what an honor...) :-)

For peak performance and efficiency, omit indexes from the log table.
> Adding to a log is just appending to a table, which is particularly
> fast.  But if there are indexes, entries must be inserted into the
> middle of the indexes (so that the all entries end up in index-order)
> which requires more I/O.
>

I see. However, giving up on indexes kind of defeats the whole purpose of
having a database.
I assume there is no way to leverage the fact that e.g. rows would be
inherently sorted by timestamp, without recurring to indexes.

If possible, combine two or more (as many as practical, subject to
> your timing constraints) inserts into a single transaction.
>

That would have been my guess, so I'm glad to hear it makes sense then.


> Using the latest Linux kernels and F2FS filesystem modules, and if you
> compile SQLite using SQLITE_ENABLE_BATCH_ATOMIC_WRITE, and if you use
> DELETE mode rather than WAL mode, then SQLite will try to take
> advantage of the batch atomic write capabilities provided by F2FS.
> This can be *much* more efficient in terms of performance and I/O and
> also reduced wear on the flash chips.


Cool, I had no idea about this capability, I'll have to dig it further.


> The downsides to this approach
> are (1) there is an exclusive lock held for a few milliseconds while
> doing each write, so readers and writers cannot overlap as they can in
> WAL mode and


Actually my biggest concern with DELETE mode is not the writer blocking
readers,
as much as readers (which might potentially take a very long time to
consume all data for e.g. analytics)
blocking the writer.
I don't believe there is a way to give up any of the ACID properties to
work around this limitation, is there?


> (2) there are scattered reports of instabilities with
> F2FS following power failures in the middle of transactions - it is
> unclear if these reports are true or not and they are rare in any
> event, but F2FS is relatively new technology (compared to EXT4) so
> there *might* still be some obscure problems on optimized code paths.
>

Thank you for the heads-up. I was not aware of any such issues.
I guess some continuous testing would be in order here.
What kind of worries me is the chance of getting stranded with a corrupt
database
and no way to automatically recover the oldest (long untouched) parts of it.

Having the WAL file on a ramdisk could result in database corruption
> if you take a power failure in the middle of a checkpoint.
>

OK, that was a stupid idea anyway.
I believe just leaving SYNC=1 is probably enough to reduce writes to a
minimum, isn't it?

Thank you,
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Gerlando Falauto

See the ext/misc/unionvtab.c extension for "reading" a bunch of databases as if they were a single database.

https://www.sqlite.org/src/artifact/0b3173f69b8899da

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Gerlando Falauto

>I see. However, giving up on indexes kind of defeats the whole
>purpose of having a database.

>I assume there is no way to leverage the fact that e.g. rows would be
>inherently sorted by timestamp, without recurring to indexes.

There kind of is.  You can do this using a bit of funny business by "encoding" the timestamp into the rowid and storing the incoming data at row numbers rather than just using the default ascending rowid.  For example, lets assume that you have a fixed number (or allow for a fixed number) of data sources (source points) and that your timestamp is an arbitrary integer such that you can do some manipulation without overflowing -- it can be for example integer milliseconds since the Unix Epoch for example.  The rowid number is 64 bits in SQLite (63 since we don't want to deal with the sign and all integers in SQLite are signed).

__int64 rowid;

// Generate a unique rowid by combining the scan time with the source

rowid = timestamp;   // up to 47-bit unsigned timestamp
rowid <<= 16;        // shift the timestamp over
rowid += sourcetag;  // store the upto 16-bit (unsigned) integer sourcetag# in the rowid

INSERT INTO ... (rowid, ...) values (?, ...); binding the computed rowid.

When you append the data into the table it will automatically be in order by timestamp.  You can still store the timestamps and sourcetag numbers seperately if you wish, but now you can use the rowid to get a range of rows based on the timestamp (and to delete based on age for example) since you can now compute the range of rows to scan without using an additional index.  This assumes, of course, that the computation to arrive at the rowid from incoming data will generate unique rowid's.  Presuming that the timestamps are ascending and all the same for each scan there is not much difference between this and using the default incrementing rowid, particularly if you can batch insert the rows -- you have just overloaded what would otherwise be simple ascending numbers for the rowid with simple ascending numbers that have an overloaded meaning.

You would not want to encode the sourcetag into the highorder bits of the rowid as this would mean that you are not really merely "appending" to the table ... To obtain purely the same performance as just "appending rows" to a rowid table, you should sort the insert data in each batch to make sure the inserts are in ascending order of rowid.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.





_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Keith Medcalf
On Mon, Oct 29, 2018 at 6:38 PM Keith Medcalf <[hidden email]> wrote:

>
> See the ext/misc/unionvtab.c extension for "reading" a bunch of databases
> as if they were a single database.
>
> https://www.sqlite.org/src/artifact/0b3173f69b8899da


Cool, indeed.
I also had a look at the CSV file extension:
https://www.sqlite.org/src/artifact?udc=1&ln=on&name=65297bcce8d5acd5
Someone has actually come up with an extension to read Parquet files:
https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html

(...taking a deep breath...) Alright, I'm just gonna say it.
How {hard, stupid, useful} do you guys think it would be to write an SQLite
extension to add directory-based partitioning on top of the CSV extension
and let the OS and filesystem take care of the rest?
Something like "day=2018-10-29\source=source_a\bucket1.csv".
I've heard people call it "hive-style" partitioning.
As long as the size of each individual file remains reasonable, I might as
well be happy with CSV files and just read the whole file sequentially.

Alternatively, the same partitioning approach might be added to unionvtab,
whatever feels simpler.
I have no idea how the partitioning semantics could be specified though.

I know, that kinda brings us back to my original question at the end of
July about database sharding:
https://www.mail-archive.com/sqlite-users@.../msg111250.html
Perhaps I'm just too biased towards this approach.

Thank you so much for your patience guys!
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Keith Medcalf
In reply to this post by Gerlando Falauto

If you don't mind me asking, what sort of data are you collecting?  
Are you the master (ie, scanning) or a slave (getting async data pushed to you).
Are you "compressing" the returned data (storing only changes exceeding the deadband) or are you storing every value (or is the source instrument doing compression)?

I presume you need to store the TimeStamp, Point, Value and Confidence.  What is the data rate (# Points and Frequency)

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Mark Wagner
Going back to the comments from Dr. Hipp regarding WAL vs DELETE mode on
F2FS devices, I just wanted to confirm my understanding.

Given a device with F2FS and with sqlite compiled with
SQLITE_ENABLE_BATCH_ATOMIC_WRITE, writes with DELETE mode will be
considerably faster than with WAL mode.

But a relatively long lived transaction that contains a significant amount
of computation with lots of reads and writes would still block reads on
other threads.  So WAL could still be the better choice in some
circumstances -- even with F2FS since reads can be happening in parallel.
Am I missing something?

On Mon, Oct 29, 2018 at 8:58 PM Keith Medcalf <[hidden email]> wrote:

>
> If you don't mind me asking, what sort of data are you collecting?
> Are you the master (ie, scanning) or a slave (getting async data pushed to
> you).
> Are you "compressing" the returned data (storing only changes exceeding
> the deadband) or are you storing every value (or is the source instrument
> doing compression)?
>
> I presume you need to store the TimeStamp, Point, Value and Confidence.
> What is the data rate (# Points and Frequency)
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite for datalogging - best practices

Gerlando Falauto
In reply to this post by Keith Medcalf
On Tue, Oct 30, 2018 at 4:58 AM Keith Medcalf <[hidden email]> wrote:

>
> If you don't mind me asking, what sort of data are you collecting?
> Are you the master (ie, scanning) or a slave (getting async data pushed to
> you).
> Are you "compressing" the returned data (storing only changes exceeding
> the deadband) or are you storing every value (or is the source instrument
> doing compression)?
>
I presume you need to store the TimeStamp, Point, Value and Confidence.
> What is the data rate (# Points and Frequency)
>

The bulk of data consists of streams of AC signals being pushed from a
handful of 3-axis accelerometers which are more or less synchronous.
Data rate is in the order of a few hundreds samples/sec for each sensor.
A first software layer handles buffering and passes one-second buffers to a
second software layer which then saves it to the database for later
analysis.
Database schema currently consists of a single table with roughly the
following columns: timestamp, sensorid (string), datatype (string) and a
string containing the JSON encoding of those few hundred samples (as a JSON
array).
So each row takes up about 3-4KBs (~200 samples * ~5 bytes/samples * 3 axes
+ overhead).
At a later stage one may want to pack together adjacent chunks into even
longer strings (so to reduce the total number of rows) and/or store data in
a more efficient manner (e.g. in binary or compressed form).
I don't particularly like this way of storing logical streams (i.e. Time
Series) in chunks but I could find any better way.

There's also some way-less-frequent readings (scalar quantities being
collected every 30 seconds or so) currently being stored in the same table.

Any suggestion on how to improve this?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12