After deleting data from a FTS table and doing VACUUM, unwanted data remains

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

After deleting data from a FTS table and doing VACUUM, unwanted data remains

Matt Kloss
Dear sqlite users,

I noticed that when you delete lines from a FTS virtual table, somehow there is some data remaining in the sqlite db, so that's it does not shrink much in size.

$ sqlite3 test.sql "CREATE VIRTUAL TABLE tab USING fts5(x)"
$ curl -s https://www.wikipedia.org | tr -cd '[:alnum:][:space:]' > wikipedia
$ ls -lh wikipedia test.sql
-rw-r--r-- 1 mkloss mkloss 24K feb. 25 06:55 test.sql
-rw-r--r-- 1 mkloss mkloss 54K feb. 25 06:56 wikipedia
$ sqlite3 test.sql ".import wikipedia tab" && ls -lh test.sql
-rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql
$ sqlite3 test.sql "delete from tab" && ls -lh test.sql
-rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql
$ sqlite3 test.sql "VACUUM" && ls -lh test.sql
-rw-r--r-- 1 mkloss mkloss 124K feb. 25 06:56 test.sql

I would expect the db size to be 24K (not 124K), as it was when the table "tab" was empty.

I noticed that some data remains in the tab_XXX tables, but less than 700 bytes. That's nowhere near the 100K of added cruft:

$ for t in tab_{config,content,data,docsize,idx}; do echo "select * from $t;"; done | sqlite3 test.sql | wc -c
682

So here is my questions:
(1) How do you really cleanup a db with FTS tables after deleting some lines?
(2) If there is no way to remove the cruft, does that mean that adding and deleting lines will constantly inflate the db size?

$ sqlite3 --version
3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837balt1

Thank you for your help,
Regards,

Matthew
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: After deleting data from a FTS table and doing VACUUM, unwanted data remains

Graham Holden
This might be to do with how an FTS index works under the hood,
involving various levels of "b-tree" that grow as entries are added,
but aren't always shrunk when entries are deleted.

There were a bunch of emails on the list around 4th to the 13th May
2014: sample below from Dan Kennedy (one of the SQLite devs) which
includes the original problem report and his summary of "why".

I've a feeling SOME changes MIGHT have been made to how the various
levels of b-tree were merged when entries were deleted, but I'm not
seeing anything linked to the original thread... perhaps this email
will jog a memory...

Graham


START ----- Mail from Dan Kennedy, 4 May 2014 ----------------- START

On 05/01/2014 03:30 PM, andrewmo wrote:

> We are using the FTS3 extension to sqlite to store large numbers of short
> (~300 byte) documents. This is working very well and providing us with very
> fast text search, but the behaviour around deletion of documents has me
> confused.
>
> Our system must control the the size of the database and will delete the
> oldest documents when the database size breaches a certain limit. I now
> understand from comments on this mailing list and elsewhere that this is not
> an optimal pattern for the FTS extension as doclists for the oldest
> documents are the least likely to be 'merged'.
>
> My question is, does this actually work at all? If I delete a row from my
> FTS4 table (resulting in a new empty doclist being added to the index), then
> I subsequently add many (1000s) new documents and call the 'merge' function
> several times (automerge is also enabled), is there any gaurentee that the
> empty doclist and the populated doclist that it superseded will ever be
> removed? My testing suggests this isn't the case.
>
> I have a 1GB database with 6million documents. If I keep adding new
> documents at around 1 per second and deleting documents when the size of the
> data goes beyond 1GB, the size of the index seems to grow and the number of
> documents I can store in the 1GB file seems decrease in a linear manner.
>
> Calling the 'optimize' function seems to solve this issue (removing all the
> dead doclists), but that isn't practical for our software, as it implies
> some downtime for our high availablity service due to the long execution
> time of the optimize function (Could be minutes for a 1GB file).
>
> I have seen this
> (http://sqlite.1065341.n5.nabble.com/fts3-database-grows-td42069.html) post
> from 2008. However, it predates the 'automerge' and manual merge features,
> and from the documentation I assumed these new features would delete all the
> data related to deleted documents. Am I incorrect in my assumption?
>
> Thanks for any clarification you can offer.

Normally, when you write to an FTS index (either to add new doclists or to
add delete markers) the new entries are accumulated in-memory for a while
and then flushed to a new "level-0" b-tree. A level-0 b-tree is often
roughly 1MB in size. Once there are 16 level-0 b-trees, they are merged
and written to a single level-1 b-tree. Once there are 16 level-1 b-trees...
And so on.

So when an entry is deleted from the FTS index, a delete marker is added.
But the original doclists are not actually deleted until the delete marker
and the doclists are merged into the same b-tree. Delete markers are
discarded when they are merged into the oldest b-tree in the index.

At first glance it seems (to me) that this means the index might grow to
anything up to 16 times its "optimized" size. But I think it's actually
worse than that.

Say your entire database fits into a single level-N b-tree. You keep adding
data (and delete markers) until there are 15 level-N b-trees and almost
enough data to create the 16th in lower levels. So at this point the FTS
index is 16 times its optimal size. If you then add even more data so that
the 16th level-N b-tree is created, everything gets merged together and
we're back in the optimal state - everything in a single b-tree. However -
this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time,
much more data will have to be added before everything is merged together
again.

So I'm thinking a solution might be:

   * Fix FTS so that it picks this case - when a merge includes so many
     delete markers that the output is small enough to be deemed a level-N
     b-tree, not a level-N+1 b-tree, and

   * Instead of using the default 16-way merges, the app could organize
     to periodically invoke the "merge=X,Y" command with a smaller Y value
     (say 2) to limit the maximum size of the index to Y times its optimal
     size (instead of 16 times).

It is an interesting problem. And the above is just guesswork... It would
be good to verify experimentally that the index really does grow
indefinitely
with this kind of input before trying to "fix" anything.

Dan.

ENDS ----- Mail from Dan Kennedy, 4 May 2014 --------------------- ENDS

Tuesday, February 25, 2020, 11:52:59 AM, Matt Kloss <[hidden email]> wrote:

> Dear sqlite users,

> I noticed that when you delete lines from a FTS virtual table, somehow there is some data remaining in the sqlite db, so that's it does not shrink much in size.

> $ sqlite3 test.sql "CREATE VIRTUAL TABLE tab USING fts5(x)"
> $ curl -s https://www.wikipedia.org | tr -cd '[:alnum:][:space:]' > wikipedia
> $ ls -lh wikipedia test.sql
> -rw-r--r-- 1 mkloss mkloss 24K feb. 25 06:55 test.sql
> -rw-r--r-- 1 mkloss mkloss 54K feb. 25 06:56 wikipedia
> $ sqlite3 test.sql ".import wikipedia tab" && ls -lh test.sql
> -rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql
> $ sqlite3 test.sql "delete from tab" && ls -lh test.sql
> -rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql
> $ sqlite3 test.sql "VACUUM" && ls -lh test.sql
> -rw-r--r-- 1 mkloss mkloss 124K feb. 25 06:56 test.sql

> I would expect the db size to be 24K (not 124K), as it was when the table "tab" was empty.

> I noticed that some data remains in the tab_XXX tables, but less than 700 bytes. That's nowhere near the 100K of added cruft:

> $ for t in tab_{config,content,data,docsize,idx}; do echo "select * from $t;"; done | sqlite3 test.sql | wc -c
> 682

> So here is my questions:
> (1) How do you really cleanup a db with FTS tables after deleting some lines?
> (2) If there is no way to remove the cruft, does that mean that adding and deleting lines will constantly inflate the db size?

> $ sqlite3 --version
> 3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837balt1

> Thank you for your help,
> Regards,

> Matthew
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
 



Regards,
Graham Holden


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: After deleting data from a FTS table and doing VACUUM, unwanted data remains

Luuk
script:

#!/bin/bash

if [ ! -f test.db ] ; then sqlite3 test.db "CREATE VIRTUAL TABLE tab
USING fts5(x)"; fi
sqlite3 test.db ".import wikipedia tab"
a=$(sqlite3 test.db "SELECT count(*) FROM tab_data")
echo "# records after import: $a"
sqlite3 test.db "DELETE FROM tab"
a=$(sqlite3 test.db "SELECT count(*) FROM tab_data")
echo "# records after DELETE: $a"
sqlite3 test.db "VACUUM"
a=$(sqlite3 test.db "SELECT count(*) FROM tab_data")
echo "# records after vacuum: $a"

output:

$> ./test.sh
# records after import: 15
# records after DELETE: 27
# records after vacuum: 27
$> ./test.sh
# records after import: 40
# records after DELETE: 52
# records after vacuum: 52
$> ./test.sh
# records after import: 65
# records after DELETE: 2
# records after vacuum: 2
$> ./test.sh
# records after import: 15
# records after DELETE: 27
# records after vacuum: 27
$> ./test.sh
# records after import: 40
# records after DELETE: 52
# records after vacuum: 52
$> vi testsh
$> vi test.sh
$>

So, the index does not grow indefinitely

On 25-2-2020 14:00, Graham Holden wrote:
> It is an interesting problem. And the above is just guesswork... It would
> be good to verify experimentally that the index really does grow
> indefinitely
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: After deleting data from a FTS table and doing VACUUM, unwanted data remains

Graham Holden
Tuesday, February 25, 2020, 3:00:09 PM, Luuk <[hidden email]> wrote:

[tests snipped]

> So, the index does not grow indefinitely

> On 25-2-2020 14:00, Graham Holden wrote:
>> It is an interesting problem. And the above is just guesswork... It would
>> be good to verify experimentally that the index really does grow
>> indefinitely

Just to avoid (more) confusion, that speculation was from Dan's email
from 2014 (I pasted his response, quoting the original email from
andrewmo who raised the issue: perhaps I should have tried to add
another layer of quoting...)

IIRC (and I probably don't), I think it was found that there wasn't
any "grow indefinitely" involved. I also suspect the cyclic nature of
the post-vacuum numbers (27..52..2..27..52) is indicative of what (I
think) Dan was describing, namely that the "clean-up" isn't always as
"aggressive" as it potentially could be:

>                                  If you then add even more data so that
> the 16th level-N b-tree is created, everything gets merged together and
> we're back in the optimal state - everything in a single b-tree. However -
> this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time,
> much more data will have to be added before everything is merged together
> again.

Regards,
Graham Holden


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Fwd: inserting new data only

Przemek Klosowski
In reply to this post by Luuk
I am storing time series data arriving from a sensor into (time,value)
records, like so:
10:32  12
10:35  15
10:37  15
10:39  13
10:43  13
10:46  18

and I want to avoid storing repetitive data, so that the database should contain
10:32  12
10:35  15
10:39  13
10:46  18
where only the earliest time with the unchanging value is stored.

I don't see how INSERT could be conditional on e.g.  value != (select
value from tbl order by time descending limit 1), so I thought I'd use
triggers. The only way I could think of was to delete the new
duplicate record after it has been inserted:

create trigger cull after insert on tbl when
 (select value-lead(value) over (order by time desc) from a limit 1) = 0
begin
   delete from a where time like new.time;
end;

Is there a simpler way?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

David Raymond
A before trigger which uses the raise function would stop it from getting inserted in the first place.

create trigger cull
before insert on tbl
when new.value = (select value from tbl order by time desc limit 1)
begin
select raise(ignore);
end;

Or if you want it to actually return an error to let you know what happened you could make it
select raise(abort, 'Repeated entry');



-----Original Message-----
From: sqlite-users <[hidden email]> On Behalf Of Przemek Klosowski
Sent: Tuesday, February 25, 2020 12:02 PM
To: SQLite mailing list <[hidden email]>
Subject: [sqlite] Fwd: inserting new data only

I am storing time series data arriving from a sensor into (time,value)
records, like so:
10:32  12
10:35  15
10:37  15
10:39  13
10:43  13
10:46  18

and I want to avoid storing repetitive data, so that the database should contain
10:32  12
10:35  15
10:39  13
10:46  18
where only the earliest time with the unchanging value is stored.

I don't see how INSERT could be conditional on e.g.  value != (select
value from tbl order by time descending limit 1), so I thought I'd use
triggers. The only way I could think of was to delete the new
duplicate record after it has been inserted:

create trigger cull after insert on tbl when
 (select value-lead(value) over (order by time desc) from a limit 1) = 0
begin
   delete from a where time like new.time;
end;

Is there a simpler way?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

John McKown
In reply to this post by Przemek Klosowski
On Tue, Feb 25, 2020 at 11:03 AM Przemek Klosowski <
[hidden email]> wrote:

> I am storing time series data arriving from a sensor into (time,value)
> records, like so:
> 10:32  12
> 10:35  15
> 10:37  15
> 10:39  13
> 10:43  13
> 10:46  18
>
> and I want to avoid storing repetitive data, so that the database should
> contain
> 10:32  12
> 10:35  15
> 10:39  13
> 10:46  18
> where only the earliest time with the unchanging value is stored.
>
> I don't see how INSERT could be conditional on e.g.  value != (select
> value from tbl order by time descending limit 1), so I thought I'd use
> triggers. The only way I could think of was to delete the new
> duplicate record after it has been inserted:
>
> create trigger cull after insert on tbl when
>  (select value-lead(value) over (order by time desc) from a limit 1) = 0
> begin
>    delete from a where time like new.time;
> end;
>
> Is there a simpler way?
> _______________________________________________
>


Why not:

CREATE TABLE ME (ATIME TIME, VALUE INTEGER PRIMARY KEY);

You can't INSERT duplicate numbers into the "VALUE" column, it will fail.


--
People in sleeping bags are the soft tacos of the bear world.
Maranatha! <><
John McKown
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Przemek Klosowski
In reply to this post by David Raymond
On Tue, Feb 25, 2020 at 12:22 PM David Raymond <[hidden email]> wrote:

>
> A before trigger which uses the raise function would stop it from getting inserted in the first place.
>
> create trigger cull
> before insert on tbl
> when new.value = (select value from tbl order by time desc limit 1)
> begin
> select raise(ignore);
> end;
>
> Or if you want it to actually return an error to let you know what happened you could make it
> select raise(abort, 'Repeated entry');

Ah, that's cool---I missed the 'ignore' possibility and thought that
the before trigger can't prevent the subsequent insert.

>
>
> -----Original Message-----
> From: sqlite-users <[hidden email]> On Behalf Of Przemek Klosowski
> Sent: Tuesday, February 25, 2020 12:02 PM
> To: SQLite mailing list <[hidden email]>
> Subject: [sqlite] Fwd: inserting new data only
>
> I am storing time series data arriving from a sensor into (time,value)
> records, like so:
> 10:32  12
> 10:35  15
> 10:37  15
> 10:39  13
> 10:43  13
> 10:46  18
>
> and I want to avoid storing repetitive data, so that the database should contain
> 10:32  12
> 10:35  15
> 10:39  13
> 10:46  18
> where only the earliest time with the unchanging value is stored.
>
> I don't see how INSERT could be conditional on e.g.  value != (select
> value from tbl order by time descending limit 1), so I thought I'd use
> triggers. The only way I could think of was to delete the new
> duplicate record after it has been inserted:
>
> create trigger cull after insert on tbl when
>  (select value-lead(value) over (order by time desc) from a limit 1) = 0
> begin
>    delete from a where time like new.time;
> end;
>
> Is there a simpler way?
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Przemek Klosowski
In reply to this post by John McKown
On Tue, Feb 25, 2020 at 1:03 PM John McKown
<[hidden email]> wrote:

> > I am storing time series data arriving from a sensor into (time,value)
> > records, like so:
> > 10:32  12
> > 10:35  15
> > 10:37  15
> > 10:39  13
> > 10:43  13
> > 10:46  18
> >
> > and I want to avoid storing repetitive data, so that the database should
> > contain
> > 10:32  12
> > 10:35  15
> > 10:39  13
> > 10:46  18
> > where only the earliest time with the unchanging value is stored.
...
> Why not:
>
> CREATE TABLE ME (ATIME TIME, VALUE INTEGER PRIMARY KEY);
>
> You can't INSERT duplicate numbers into the "VALUE" column, it will fail.

This won't work here because the same value COULD reappear later:
12 15 15 13 13 18 15
needs to be registered as 12 15 13 18 15
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Keith Medcalf
In reply to this post by Przemek Klosowski

create table data
(
    key    text primary key,
    data   integer not null
)
without rowid;

-- insert into data select (?, ? as value where value IS NOT (select data from (select max(key), data from data));

insert into data select '10:32', 12 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:35', 15 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:37', 15 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:39', 13 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:43', 13 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:46', 18 as value where value IS NOT (select data from (select max(key), data from data));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

Constraints:

(1) Will only work for appending data (new key > all keys in table)
(2) Types of key and data are immaterial as long as you are only inserting (appending) new keys.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Przemek Klosowski
>Sent: Tuesday, 25 February, 2020 10:02
>To: SQLite mailing list <[hidden email]>
>Subject: [sqlite] Fwd: inserting new data only
>
>I am storing time series data arriving from a sensor into (time,value)
>records, like so:
>10:32  12
>10:35  15
>10:37  15
>10:39  13
>10:43  13
>10:46  18
>
>and I want to avoid storing repetitive data, so that the database should
>contain
>10:32  12
>10:35  15
>10:39  13
>10:46  18
>where only the earliest time with the unchanging value is stored.
>
>I don't see how INSERT could be conditional on e.g.  value != (select
>value from tbl order by time descending limit 1), so I thought I'd use
>triggers. The only way I could think of was to delete the new
>duplicate record after it has been inserted:
>
>create trigger cull after insert on tbl when
> (select value-lead(value) over (order by time desc) from a limit 1) = 0
>begin
>   delete from a where time like new.time;
>end;
>
>Is there a simpler way?
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Keith Medcalf

Note that this will work for discrete data from a sensor but will not properly historize continuous data.  That is, if what you are historizing is process data such as a temperature, this will not permit you to re-create the original engineering data.  For that you need to allow the last duplicate to be stored and also store the computed slope to prior with each append -- in that case triggers would be the only way to do it.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: Keith Medcalf <[hidden email]>
>Sent: Tuesday, 25 February, 2020 11:18
>To: 'SQLite mailing list' <[hidden email]>
>Subject: RE: [sqlite] Fwd: inserting new data only
>
>
>create table data
>(
>    key    text primary key,
>    data   integer not null
>)
>without rowid;
>
>-- insert into data select (?, ? as value where value IS NOT (select data
>from (select max(key), data from data));
>
>insert into data select '10:32', 12 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:35', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:37', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:39', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:43', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:46', 18 as value where value IS NOT (select
>data from (select max(key), data from data));
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>Constraints:
>
>(1) Will only work for appending data (new key > all keys in table)
>(2) Types of key and data are immaterial as long as you are only
>inserting (appending) new keys.
>
>--
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-----Original Message-----
>>From: sqlite-users <[hidden email]> On
>>Behalf Of Przemek Klosowski
>>Sent: Tuesday, 25 February, 2020 10:02
>>To: SQLite mailing list <[hidden email]>
>>Subject: [sqlite] Fwd: inserting new data only
>>
>>I am storing time series data arriving from a sensor into (time,value)
>>records, like so:
>>10:32  12
>>10:35  15
>>10:37  15
>>10:39  13
>>10:43  13
>>10:46  18
>>
>>and I want to avoid storing repetitive data, so that the database should
>>contain
>>10:32  12
>>10:35  15
>>10:39  13
>>10:46  18
>>where only the earliest time with the unchanging value is stored.
>>
>>I don't see how INSERT could be conditional on e.g.  value != (select
>>value from tbl order by time descending limit 1), so I thought I'd use
>>triggers. The only way I could think of was to delete the new
>>duplicate record after it has been inserted:
>>
>>create trigger cull after insert on tbl when
>> (select value-lead(value) over (order by time desc) from a limit 1) = 0
>>begin
>>   delete from a where time like new.time;
>>end;
>>
>>Is there a simpler way?
>>_______________________________________________
>>sqlite-users mailing list
>>[hidden email]
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: inserting new data only

Simon Slavin-3
In reply to this post by Keith Medcalf
This strikes me as best solved in the programming language.

If a single set of data points is being acquired in real time, and you have a programming language (or script) generating the INSERT commands, why not simply keep the most recently inserted temperature in a variable ?

On the other hand, if you have multiple sensors, or out-of-order insertion, or a stateless insertion program, you could insert every reading and before reporting use a 'cleanup' procedure to remove redundant readings.

Both the above would be faster than having SQL execute a search every time a new reading is added.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Przemek Klosowski
In reply to this post by Keith Medcalf
On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf <[hidden email]> wrote:

> create table data
> (
>     key    text primary key,
>     data   integer not null
> )
> without rowid;
>
> -- insert into data select (?, ? as value where value IS NOT (select data from (select max(key), data from data));
>..
> Constraints:
>
> (1) Will only work for appending data (new key > all keys in table)
> (2) Types of key and data are immaterial as long as you are only inserting (appending) new keys.

Awesome---exactly what's needed.
The monotonicity of the time key variable is assured by how the data
is collected---but  is there a way to express that in sqlite?
create table data (
  key text primary key check
(julianday(key) > julianday(select max(key) from data),
  data integer not null);
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

David Raymond
Awesome---exactly what's needed.
The monotonicity of the time key variable is assured by how the data
is collected---but  is there a way to express that in sqlite?
create table data (
  key text primary key check
(julianday(key) > julianday(select max(key) from data),
  data integer not null);


That would/should be done in a trigger, and not a check constraint. A check constraint is only supposed to be something which will _always_ be true about that one and only record, and which only needs the contents of that 1 record to determine, and not something that might change depending on... anything else.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Keith Medcalf
In reply to this post by Przemek Klosowski

On Tuesday, 25 February, 2020 12:23, Przemek Klosowski <[hidden email]> wrote:

>On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf <[hidden email]> wrote:

>> create table data
>> (
>>     key    text primary key,
>>     data   integer not null
>> )
>> without rowid;
>>
>> -- insert into data select ?, ? as value where value IS NOT (select
>data from (select max(key), data from data));
>>..
>> Constraints:
>>
>> (1) Will only work for appending data (new key > all keys in table)
>> (2) Types of key and data are immaterial as long as you are only
>inserting (appending) new keys.

>Awesome---exactly what's needed.
>The monotonicity of the time key variable is assured by how the data
>is collected---but  is there a way to express that in sqlite?
>create table data (
>  key text primary key check
>(julianday(key) > julianday(select max(key) from data),
>  data integer not null);

You cannot do this with a CHECK constraint since check constraints cannot execute select statements (check constraints should be table invariant -- meaning that they must return the same result no matter what data is in the table or other tables, so only act to validate the data on the current row).  

This would be a case for a before insert trigger to prohibit the insert before it is performed (an after trigger would also work, but that would fire after the row is already inserted and would work by doing a statement rollback to delete the inserted row, so you want to avoid the row insertion completely, and the way to do that is with a before trigger):

create trigger data_prohibit_oo_inserts before insert on data
  when julianday(new.key) <= (select julianday(max(key)) from data)
begin
    select raise(ABORT, 'Out of order insert');
end;

This means a lookup and check after the record insertion is computed, if a record is to be inserted, but the btree will already be in memory and will have already traversed to the last entry, so this will consume CPU only, and very little at that.

create table data
(
    key    text primary key,
    data   integer not null
)
without rowid;

create trigger data_prohibit_oo_inserts before insert on data
  when julianday(new.key) <= (select julianday(max(key)) from data)
begin
    select raise(ABORT, 'Out of order insert');
end;


-- insert into data select ?, ? as value where value != (select value from (select max(key), value from data));

insert into data select '10:32', 12 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:35', 15 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:37', 15 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:39', 13 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:43', 13 as value where value IS NOT (select data from (select max(key), data from data));
insert into data select '10:46', 18 as value where value IS NOT (select data from (select max(key), data from data));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

insert into data select '10:32', 12 as value where value IS NOT (select data from (select max(key), data from data));
Error: near line 27: Out of order insert
insert into data select '10:35', 15 as value where value IS NOT (select data from (select max(key), data from data));
Error: near line 28: Out of order insert
insert into data select '10:37', 15 as value where value IS NOT (select data from (select max(key), data from data));
Error: near line 29: Out of order insert
insert into data select '10:39', 13 as value where value IS NOT (select data from (select max(key), data from data));
Error: near line 30: Out of order insert
insert into data select '10:43', 13 as value where value IS NOT (select data from (select max(key), data from data));
Error: near line 31: Out of order insert
insert into data select '10:46', 18 as value where value IS NOT (select data from (select max(key), data from data));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

You could also change the insert like this:

insert into data
     select ? as key, ? as value
      where value IS NOT (select data from (select max(key), data from data))
        and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));

but that would just silently ignore the error rather than raising an error.  (You need the coalesce because the "select julianday(max(key)) from data" could be null and it has to be non-null for the > expression to return a not null result (anything > null returns null which is FALSE in a where clause and NOT would not help since "NOT (anything > null)" is still null or false.  The trigger does not have this problem because a NULL result means the trigger does not fire.  Of course, changing the raise(ABORT ...) in the trigger to raise (IGNORE) achieves the same result.

create table data
(
    key    text primary key,
    data   integer not null
)
without rowid;

insert into data select '10:32' as key, 12 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:35' as key, 15 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:37' as key, 15 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:39' as key, 13 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:43' as key, 13 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:46' as key, 18 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

insert into data select '10:32' as key, 12 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:35' as key, 15 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:37' as key, 15 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:39' as key, 13 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:43' as key, 13 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));
insert into data select '10:46' as key, 18 as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Keith Medcalf

If you are going to do it in all in one insert statement rather than using a before trigger to throw an error (ie, you want to silently ignore out-of-order inserts) then the following is slightly more efficient since the query planner appears to materialize the search so only does it once:

with old(key, data)
  as (
      select coalesce(julianday(max(key)), -1),
             data
        from data
     ),
     new(key, data)
  as (
        values (?, ?)
     )
insert into data
     select new.key, new.data
       from new, old
      where new.data IS NOT old.data
        and julianday(new.key) > old.key;

However, without the trigger the database will not enforce the monotonicity of the insert timestamps ...

Note that you could do the whole thing in before trigger's which would mean you just use a regular old insert and the triggers do all the work, and then the database would entirely enforce its own integrity and rules ... no matter who or what was trying to insert records ...

create table data
(
    key    text primary key,
    data   integer not null
)
without rowid;

create trigger data_prohibit_oo_inserts before insert on data
  when julianday(new.key) <= (select julianday(max(key)) from data)
begin
    select raise(ABORT, 'Out of order insert');
end;

create trigger data_prohibit_duplicates before insert on data
  when new.data IS (select data from (select max(key), data from data))
begin
    select raise(IGNORE);
end;


-- insert into data values (?, ?);

insert into data values ('10:32', 12);
insert into data values ('10:35', 15);
insert into data values ('10:37', 15);
insert into data values ('10:39', 13);
insert into data values ('10:43', 13);
insert into data values ('10:46', 18);

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

insert into data values ('10:32', 12);
Error: near line 33: Out of order insert
insert into data values ('10:35', 15);
Error: near line 34: Out of order insert
insert into data values ('10:37', 15);
Error: near line 35: Out of order insert
insert into data values ('10:39', 13);
Error: near line 36: Out of order insert
insert into data values ('10:43', 13);
Error: near line 37: Out of order insert
insert into data values ('10:46', 18);

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

You could even do that with just one before trigger ...

create table data
(
    key    text primary key,
    data   integer not null
)
without rowid;

create trigger data_insert before insert on data
begin
    select raise(ABORT, 'Out of order insert')
     where julianday(new.key) <= coalesce((select julianday(max(key)) from data), -1);
    select raise(IGNORE)
     where new.data IS (select data from (select max(key), data from data));
end;

-- insert into data values (?, ?);

insert into data values ('10:32', 12);
insert into data values ('10:35', 15);
insert into data values ('10:37', 15);
insert into data values ('10:39', 13);
insert into data values ('10:43', 13);
insert into data values ('10:46', 18);

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

insert into data values ('10:32', 12);
Error: near line 28: Out of order insert
insert into data values ('10:35', 15);
Error: near line 29: Out of order insert
insert into data values ('10:37', 15);
Error: near line 30: Out of order insert
insert into data values ('10:39', 13);
Error: near line 31: Out of order insert
insert into data values ('10:43', 13);
Error: near line 32: Out of order insert
insert into data values ('10:46', 18);
Error: near line 33: Out of order insert

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Keith Medcalf
>Sent: Tuesday, 25 February, 2020 13:15
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] Fwd: inserting new data only
>
>
>On Tuesday, 25 February, 2020 12:23, Przemek Klosowski
><[hidden email]> wrote:
>
>>On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf <[hidden email]>
>wrote:
>
>>> create table data
>>> (
>>>     key    text primary key,
>>>     data   integer not null
>>> )
>>> without rowid;
>>>
>>> -- insert into data select ?, ? as value where value IS NOT (select
>>data from (select max(key), data from data));
>>>..
>>> Constraints:
>>>
>>> (1) Will only work for appending data (new key > all keys in table)
>>> (2) Types of key and data are immaterial as long as you are only
>>inserting (appending) new keys.
>
>>Awesome---exactly what's needed.
>>The monotonicity of the time key variable is assured by how the data
>>is collected---but  is there a way to express that in sqlite?
>>create table data (
>>  key text primary key check
>>(julianday(key) > julianday(select max(key) from data),
>>  data integer not null);
>
>You cannot do this with a CHECK constraint since check constraints cannot
>execute select statements (check constraints should be table invariant --
>meaning that they must return the same result no matter what data is in
>the table or other tables, so only act to validate the data on the
>current row).
>
>This would be a case for a before insert trigger to prohibit the insert
>before it is performed (an after trigger would also work, but that would
>fire after the row is already inserted and would work by doing a
>statement rollback to delete the inserted row, so you want to avoid the
>row insertion completely, and the way to do that is with a before
>trigger):
>
>create trigger data_prohibit_oo_inserts before insert on data
>  when julianday(new.key) <= (select julianday(max(key)) from data)
>begin
>    select raise(ABORT, 'Out of order insert');
>end;
>
>This means a lookup and check after the record insertion is computed, if
>a record is to be inserted, but the btree will already be in memory and
>will have already traversed to the last entry, so this will consume CPU
>only, and very little at that.
>
>create table data
>(
>    key    text primary key,
>    data   integer not null
>)
>without rowid;
>
>create trigger data_prohibit_oo_inserts before insert on data
>  when julianday(new.key) <= (select julianday(max(key)) from data)
>begin
>    select raise(ABORT, 'Out of order insert');
>end;
>
>
>-- insert into data select ?, ? as value where value != (select value
>from (select max(key), value from data));
>
>insert into data select '10:32', 12 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:35', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:37', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:39', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:43', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:46', 18 as value where value IS NOT (select
>data from (select max(key), data from data));
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>insert into data select '10:32', 12 as value where value IS NOT (select
>data from (select max(key), data from data));
>Error: near line 27: Out of order insert
>insert into data select '10:35', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>Error: near line 28: Out of order insert
>insert into data select '10:37', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>Error: near line 29: Out of order insert
>insert into data select '10:39', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>Error: near line 30: Out of order insert
>insert into data select '10:43', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>Error: near line 31: Out of order insert
>insert into data select '10:46', 18 as value where value IS NOT (select
>data from (select max(key), data from data));
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>You could also change the insert like this:
>
>insert into data
>     select ? as key, ? as value
>      where value IS NOT (select data from (select max(key), data from
>data))
>        and (julianday(key) > coalesce((select julianday(max(key)) from
>data),-1));
>
>but that would just silently ignore the error rather than raising an
>error.  (You need the coalesce because the "select julianday(max(key))
>from data" could be null and it has to be non-null for the > expression
>to return a not null result (anything > null returns null which is FALSE
>in a where clause and NOT would not help since "NOT (anything > null)" is
>still null or false.  The trigger does not have this problem because a
>NULL result means the trigger does not fire.  Of course, changing the
>raise(ABORT ...) in the trigger to raise (IGNORE) achieves the same
>result.
>
>create table data
>(
>    key    text primary key,
>    data   integer not null
>)
>without rowid;
>
>insert into data select '10:32' as key, 12 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:35' as key, 15 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:37' as key, 15 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:39' as key, 13 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:43' as key, 13 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:46' as key, 18 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>insert into data select '10:32' as key, 12 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:35' as key, 15 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:37' as key, 15 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:39' as key, 13 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:43' as key, 13 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>insert into data select '10:46' as key, 18 as value where value IS NOT
>(select data from (select max(key), data from data)) and (julianday(key)
>> coalesce((select julianday(max(key)) from data),-1));
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>--
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Keith Medcalf

I keep forgetting that the mix/max optimization is not applied at the same time as retrieving other data from a table, so if you actually want to optimize the generated plan, you need to use the following trigger:

create trigger data_insert before insert on data
begin
    select raise(ABORT, 'Out of order insert')
     where julianday(new.key) <= coalesce((select julianday(max(key)) from data), -1);
    select raise(IGNORE)
     where new.data IS (select data from data order by key desc limit 1);
end;

The query optimizer is smart enough to recognize that max(key) when used by itself and an index is available on key, that the index can be used to locate the row containing the max key (it is the last one and there is no need to look any further).  However, if you select max(key), data from table then the optimization is not applied and a scan of the whole table is done looking for the max(key) even though there is a suitable index, and you have to specify the index and how to use it and that you only need 1st result.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Keith Medcalf
>Sent: Tuesday, 25 February, 2020 14:44
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] Fwd: inserting new data only
>
>
>If you are going to do it in all in one insert statement rather than
>using a before trigger to throw an error (ie, you want to silently ignore
>out-of-order inserts) then the following is slightly more efficient since
>the query planner appears to materialize the search so only does it once:
>
>with old(key, data)
>  as (
>      select coalesce(julianday(max(key)), -1),
>             data
>        from data
>     ),
>     new(key, data)
>  as (
>        values (?, ?)
>     )
>insert into data
>     select new.key, new.data
>       from new, old
>      where new.data IS NOT old.data
>        and julianday(new.key) > old.key;
>
>However, without the trigger the database will not enforce the
>monotonicity of the insert timestamps ...
>
>Note that you could do the whole thing in before trigger's which would
>mean you just use a regular old insert and the triggers do all the work,
>and then the database would entirely enforce its own integrity and rules
>... no matter who or what was trying to insert records ...
>
>create table data
>(
>    key    text primary key,
>    data   integer not null
>)
>without rowid;
>
>create trigger data_prohibit_oo_inserts before insert on data
>  when julianday(new.key) <= (select julianday(max(key)) from data)
>begin
>    select raise(ABORT, 'Out of order insert');
>end;
>
>create trigger data_prohibit_duplicates before insert on data
>  when new.data IS (select data from (select max(key), data from data))
>begin
>    select raise(IGNORE);
>end;
>
>
>-- insert into data values (?, ?);
>
>insert into data values ('10:32', 12);
>insert into data values ('10:35', 15);
>insert into data values ('10:37', 15);
>insert into data values ('10:39', 13);
>insert into data values ('10:43', 13);
>insert into data values ('10:46', 18);
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>insert into data values ('10:32', 12);
>Error: near line 33: Out of order insert
>insert into data values ('10:35', 15);
>Error: near line 34: Out of order insert
>insert into data values ('10:37', 15);
>Error: near line 35: Out of order insert
>insert into data values ('10:39', 13);
>Error: near line 36: Out of order insert
>insert into data values ('10:43', 13);
>Error: near line 37: Out of order insert
>insert into data values ('10:46', 18);
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>You could even do that with just one before trigger ...
>
>create table data
>(
>    key    text primary key,
>    data   integer not null
>)
>without rowid;
>
>create trigger data_insert before insert on data
>begin
>    select raise(ABORT, 'Out of order insert')
>     where julianday(new.key) <= coalesce((select julianday(max(key))
>from data), -1);
>    select raise(IGNORE)
>     where new.data IS (select data from (select max(key), data from
>data));
>end;
>
>-- insert into data values (?, ?);
>
>insert into data values ('10:32', 12);
>insert into data values ('10:35', 15);
>insert into data values ('10:37', 15);
>insert into data values ('10:39', 13);
>insert into data values ('10:43', 13);
>insert into data values ('10:46', 18);
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>insert into data values ('10:32', 12);
>Error: near line 28: Out of order insert
>insert into data values ('10:35', 15);
>Error: near line 29: Out of order insert
>insert into data values ('10:37', 15);
>Error: near line 30: Out of order insert
>insert into data values ('10:39', 13);
>Error: near line 31: Out of order insert
>insert into data values ('10:43', 13);
>Error: near line 32: Out of order insert
>insert into data values ('10:46', 18);
>Error: near line 33: Out of order insert
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>--
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-----Original Message-----
>>From: sqlite-users <[hidden email]> On
>>Behalf Of Keith Medcalf
>>Sent: Tuesday, 25 February, 2020 13:15
>>To: SQLite mailing list <[hidden email]>
>>Subject: Re: [sqlite] Fwd: inserting new data only
>>
>>
>>On Tuesday, 25 February, 2020 12:23, Przemek Klosowski
>><[hidden email]> wrote:
>>
>>>On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf <[hidden email]>
>>wrote:
>>
>>>> create table data
>>>> (
>>>>     key    text primary key,
>>>>     data   integer not null
>>>> )
>>>> without rowid;
>>>>
>>>> -- insert into data select ?, ? as value where value IS NOT (select
>>>data from (select max(key), data from data));
>>>>..
>>>> Constraints:
>>>>
>>>> (1) Will only work for appending data (new key > all keys in table)
>>>> (2) Types of key and data are immaterial as long as you are only
>>>inserting (appending) new keys.
>>
>>>Awesome---exactly what's needed.
>>>The monotonicity of the time key variable is assured by how the data
>>>is collected---but  is there a way to express that in sqlite?
>>>create table data (
>>>  key text primary key check
>>>(julianday(key) > julianday(select max(key) from data),
>>>  data integer not null);
>>
>>You cannot do this with a CHECK constraint since check constraints
>cannot
>>execute select statements (check constraints should be table invariant -
>-
>>meaning that they must return the same result no matter what data is in
>>the table or other tables, so only act to validate the data on the
>>current row).
>>
>>This would be a case for a before insert trigger to prohibit the insert
>>before it is performed (an after trigger would also work, but that would
>>fire after the row is already inserted and would work by doing a
>>statement rollback to delete the inserted row, so you want to avoid the
>>row insertion completely, and the way to do that is with a before
>>trigger):
>>
>>create trigger data_prohibit_oo_inserts before insert on data
>>  when julianday(new.key) <= (select julianday(max(key)) from data)
>>begin
>>    select raise(ABORT, 'Out of order insert');
>>end;
>>
>>This means a lookup and check after the record insertion is computed, if
>>a record is to be inserted, but the btree will already be in memory and
>>will have already traversed to the last entry, so this will consume CPU
>>only, and very little at that.
>>
>>create table data
>>(
>>    key    text primary key,
>>    data   integer not null
>>)
>>without rowid;
>>
>>create trigger data_prohibit_oo_inserts before insert on data
>>  when julianday(new.key) <= (select julianday(max(key)) from data)
>>begin
>>    select raise(ABORT, 'Out of order insert');
>>end;
>>
>>
>>-- insert into data select ?, ? as value where value != (select value
>>from (select max(key), value from data));
>>
>>insert into data select '10:32', 12 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>insert into data select '10:35', 15 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>insert into data select '10:37', 15 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>insert into data select '10:39', 13 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>insert into data select '10:43', 13 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>insert into data select '10:46', 18 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>
>>select * from data;
>>10:32|12
>>10:35|15
>>10:39|13
>>10:46|18
>>
>>insert into data select '10:32', 12 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>Error: near line 27: Out of order insert
>>insert into data select '10:35', 15 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>Error: near line 28: Out of order insert
>>insert into data select '10:37', 15 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>Error: near line 29: Out of order insert
>>insert into data select '10:39', 13 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>Error: near line 30: Out of order insert
>>insert into data select '10:43', 13 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>Error: near line 31: Out of order insert
>>insert into data select '10:46', 18 as value where value IS NOT (select
>>data from (select max(key), data from data));
>>
>>select * from data;
>>10:32|12
>>10:35|15
>>10:39|13
>>10:46|18
>>
>>You could also change the insert like this:
>>
>>insert into data
>>     select ? as key, ? as value
>>      where value IS NOT (select data from (select max(key), data from
>>data))
>>        and (julianday(key) > coalesce((select julianday(max(key)) from
>>data),-1));
>>
>>but that would just silently ignore the error rather than raising an
>>error.  (You need the coalesce because the "select julianday(max(key))
>>from data" could be null and it has to be non-null for the > expression
>>to return a not null result (anything > null returns null which is FALSE
>>in a where clause and NOT would not help since "NOT (anything > null)"
>is
>>still null or false.  The trigger does not have this problem because a
>>NULL result means the trigger does not fire.  Of course, changing the
>>raise(ABORT ...) in the trigger to raise (IGNORE) achieves the same
>>result.
>>
>>create table data
>>(
>>    key    text primary key,
>>    data   integer not null
>>)
>>without rowid;
>>
>>insert into data select '10:32' as key, 12 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:35' as key, 15 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:37' as key, 15 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:39' as key, 13 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:43' as key, 13 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:46' as key, 18 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>
>>select * from data;
>>10:32|12
>>10:35|15
>>10:39|13
>>10:46|18
>>
>>insert into data select '10:32' as key, 12 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:35' as key, 15 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:37' as key, 15 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:39' as key, 13 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:43' as key, 13 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>insert into data select '10:46' as key, 18 as value where value IS NOT
>>(select data from (select max(key), data from data)) and (julianday(key)
>>> coalesce((select julianday(max(key)) from data),-1));
>>
>>select * from data;
>>10:32|12
>>10:35|15
>>10:39|13
>>10:46|18
>>
>>--
>>The fact that there's a Highway to Hell but only a Stairway to Heaven
>>says a lot about anticipated traffic volume.
>>
>>
>>
>>_______________________________________________
>>sqlite-users mailing list
>>[hidden email]
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

James K. Lowden
In reply to this post by Przemek Klosowski
On Tue, 25 Feb 2020 12:02:24 -0500
Przemek Klosowski <[hidden email]> wrote:

> and I want to avoid storing repetitive data, so that the database
> should contain
> 10:32  12
> 10:35  15
> 10:39  13
> 10:46  18
> where only the earliest time with the unchanging value is stored.

Be careful what you wish for.  Usually "avoid storing" is a proxy for
some kind of intended use.  Unless it's infeasible, it's usually better
to store everything, verbatim, as it arrives.  Then you can present it
however you like, with nothing lost.  

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: inserting new data only

Przemek Klosowski
On Wed, Feb 26, 2020 at 11:56 AM James K. Lowden
<[hidden email]> wrote:
> > and I want to avoid storing repetitive data, so that the database
> > should contain
> > [...]
> > only the earliest time with the unchanging value is stored.
>
> Be careful what you wish for.  Usually "avoid storing" is a proxy for
> some kind of intended use.  Unless it's infeasible, it's usually better
> to store everything, verbatim, as it arrives.  Then you can present it
> however you like, with nothing lost.

A lot of process measurement instrumentation returns some nominal
values, for instance firmware version. Do we store it or not? Storing
it every second is silly, but if we drop it, we wouldn't be able to
for instance audit if the statistics changed in some subtle way
because of a firmware bug. This 'store only changed values' is
intended for situations like that.
Another suitable candidate for that treatment might be a status, for
instance 'on battery' value for a UPS monitoring system. I can't think
of a scenario where storing it every time would be better.
The only disadvantage I can think of would be incidental, like
monitoring the monitoring system itself: a long period of 'no data'
could be caused by a monitoring failure as well as by the value not
changing; but if that is a concern, I think it would be better to
store a single heartbeat rather than possibly multiple unchanging
values.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: inserting new data only

Simon Slavin-3
On 26 Feb 2020, at 5:18pm, Przemek Klosowski <[hidden email]> wrote:

> This 'store only changed values' is
> intended for situations like that.
> Another suitable candidate for that treatment might be a status, for
> instance 'on battery' value for a UPS monitoring system. I can't think of a scenario where storing it every time would be better.

Suppose some horrible disaster happens to your chemical plant at 4pm.  Your log for a particular sensor shows an acceptable value, but the timestamp on that reading is 3:15pm.  But only changed readings are logged.

Perhaps a fault occurred in the sampling circuit at 3:16pm, so no further readings were taken, so an alarm was not raised.  Had every reading been logged you could have proved that another reading was taken at 3:58pm still showing the same acceptable value.

So it depends why you're doing what you're doing.  Different circumstances suggest different tactics.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12