Atomic DELETE index optimisation?

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Atomic DELETE index optimisation?

Eduardo
On Mon, 18 Dec 2017 07:21:50 -0700 (MST)
Dinu <[hidden email]> escribió:

> Hick Gunter wrote
> > If you are running an SQLite version 3.11.0 or newer, only pages that are
> > changed for the first time in the transaction are copied to the WAL file,
> > thus extending it.
> >
> > While progressing the transaction, the rate of "newly changed" to "changed
> > again" pages will shift towards re-reading and rewriting the WAL file copy
> > of a page; this will tend to slow down the rate at which the WAL file is
> > growing, even at a constant delete rate, until every page has been updated
> > at least once, and then stop growing until all the remaining deletes have
> > been processed.
>
> Running 3.19; thanks for the explanation, this never occured to me and makes
> perfect sense.
>

Is it compiled with SQLITE_ENABLE_UPDATE_DELETE_LIMIT option? If yes you can
delete, let's say 100000 (10E5) rows each round. Index and metadata would fit
in sqlite cache and clean wal file, making it faster than a on big delete.

--
Eduardo <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

sub sk79
In reply to this post by Dinu
On Dec 17, 2017, at 3:53 PM, Dinu Marina
 Is there any way to fix/improve this in userland?

I think the thread already has a viable solution but still if userland fix is an option: would a soft-delete based design work for you?

https://dba.stackexchange.com/questions/125431/hard-delete-vs-soft-delete-performance

-SK

_______________________________
*StepSqlite* enhanced-PL/SQL
on SQLite and BerkeleyDB.
**Coming Soon**
---- Be Done at the Speed of Lite! --->
 https://metatranz.com/stepsqlite/ShowSignUp?guestlogin=ON




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

Dinu
Hi sub sk79,
I have so far from this thread the following suggestions:
1) Copy table -> TRUNCATE -> copy back; this doesn't work, what was supposed
to be TRUNCATE semantics (DELETE FROM without WHERE) has the same
performance as with WHERE.
2) Structure alterations; either drop table, or drop indexes : I am
reluctant to do this; my evangelical instinct tells me hacking the semantics
of life might lead to implosion of Earth :)
3) "Deleted" bit field - presumably the "soft delete" as you call it; I am
analyzing this, but here the question is whether we include the bit in the
indexes. If so, performing a heap of UPDATEs should be even more
inefficient; if we don't include it in the index, the problem of the cost of
filtering the row needs some analysis which I will probably do; the problem
with this solution is that is has residual effects: we run some pretty
complex queries against this table, with complicated joins and we already
got some surprising execution plans that needed query rewriting. So with
this "deleted" bit out of the index pool, we need to check various other
queries to make sure they are still optimized to what we need.

All this said and done, 3 hours to delete 15G of data seems atrocious even
if you do it by standards resulted from generations of DOD and NSA
inbreeding... so I'm still hopeful for some DB-related solution.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

Simon Slavin-3


On 19 Dec 2017, at 4:15pm, Dinu <[hidden email]> wrote:

> 3) "Deleted" bit field - presumably the "soft delete" as you call it;

If you do try this, the 'bit' column should be declared as INTEGER and the values stored should be 0 and 1.  SQLite is extremely efficient at storing/sorting these values.

Including this column in indexes should not slow things down much.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

Paul Sanderson
In reply to this post by Dinu
Dinu

Option 2, dropping and recreating the index with the transaction, seems to
be the way forward - I would suggest that if the author of SQlite (Dr Hipp)
has put this forward as a solution, as he did earlier in this thread,  then
it is probably a safe option and will not lead to an implosion of anything.

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 19 December 2017 at 16:15, Dinu <[hidden email]> wrote:

> Hi sub sk79,
> I have so far from this thread the following suggestions:
> 1) Copy table -> TRUNCATE -> copy back; this doesn't work, what was
> supposed
> to be TRUNCATE semantics (DELETE FROM without WHERE) has the same
> performance as with WHERE.
> 2) Structure alterations; either drop table, or drop indexes : I am
> reluctant to do this; my evangelical instinct tells me hacking the
> semantics
> of life might lead to implosion of Earth :)
> 3) "Deleted" bit field - presumably the "soft delete" as you call it; I am
> analyzing this, but here the question is whether we include the bit in the
> indexes. If so, performing a heap of UPDATEs should be even more
> inefficient; if we don't include it in the index, the problem of the cost
> of
> filtering the row needs some analysis which I will probably do; the problem
> with this solution is that is has residual effects: we run some pretty
> complex queries against this table, with complicated joins and we already
> got some surprising execution plans that needed query rewriting. So with
> this "deleted" bit out of the index pool, we need to check various other
> queries to make sure they are still optimized to what we need.
>
> All this said and done, 3 hours to delete 15G of data seems atrocious even
> if you do it by standards resulted from generations of DOD and NSA
> inbreeding... so I'm still hopeful for some DB-related solution.
>
>
>
> --
> Sent from: http://sqlite.1065341.n5.nabble.com/
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

R Smith-2
In reply to this post by Dinu

On 2017/12/19 6:15 PM, Dinu wrote:
> 2) Structure alterations; either drop table, or drop indexes : I am
> reluctant to do this; my evangelical instinct tells me hacking the semantics
> of life might lead to implosion of Earth :)

Oh the irony....

Your instinct (like most such instincts) is simply wrong.

But don't take our word for it. Take a copy of the data somewhere else,
construct some readers mimicking the in-use case (or whatever it takes
to adequately convince yourself) and try the suggested fix.

Not only will it work, it will do so really fast and prove to be not
only the best, but also the absolutely correct fix.
The good people here have even tried to build suggested scripts around
your irrational fears, but in truth the simplest version would work the
best, which is:

In a Transaction -
- Rename the Table,
- Recreate a new empty version of it,
- Copy the rows you want to keep.
- Drop the renamed table.
- Recreate any Indexes/Triggers.

A solution that might be slightly slower but much less complicated and
still very fast is simply:
In a Transaction -
- Drop the Indexes/Triggers,
- Do the deletes,
- Recreate the Indexes/Triggers.

This will work 100% even while other readers are active (thanks to the
WAL [1]) and without making them implode (perhaps pause a little bit[2],
but not implode).

Why am I (and others here) so confident this will work? Because this is
a fundamental design principle and indeed a requirement of a
Transactional and Relational Database (Which SQLite is) - Plus we do it
all the time very much relying on exactly those characteristics. There
is no "hacking" going on in any of our suggestions, this is quite the
mainstream way in which it works.

It's just like learning to ride a bicycle. Initially your fears feel
justified and the physics seem impossible magic from where you watch the
others... until that first day you find your balance and soar....  Then
pretty soon, you do it without hands on the steering wheel, just like
the other cool kids.  :)

Cheers,
Ryan

[1] - It will work even with other Journal modes than WAL, it's just
that the readers then will wait quite a bit more on the transaction to
finish, whereas the WAL allows updates to not affect concurrent readers
until a checkpoint/commit.

[2] - The file is several Gigabytes in size, it's never going to be
instant, there WILL be some waiting, but it won't take very long - try
do it at a quiet time though. You will get a feeling for the time-frame
if you do the test-case thing.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

Dominique Devienne
In reply to this post by Simon Slavin-3
On Tue, Dec 19, 2017 at 6:05 PM, Simon Slavin <[hidden email]> wrote:

> On 19 Dec 2017, at 4:15pm, Dinu <[hidden email]> wrote:
> > 3) "Deleted" bit field - presumably the "soft delete" as you call it;
> If you do try this, the 'bit' column should be declared as INTEGER and the
> values stored should be 0 and 1.  SQLite is extremely efficient at
> storing/sorting these values.
>

"extremely efficient" is a bit exaggerated IMHO.

More space efficient definitely, see serial types 8 and 8 in [1], so only
the record's header section need to be consulted,
w/o the need to decode the whole record or decode the varint [2], thus more
cpu efficient too (since less to do), but the
record's page still needs to be paged in/out on reads/writes, and that's
like to dominate anyway. Just my guess though,
no hard data to back that up. Perhaps Ryan will measure the difference he
often does :) --DD

[1] https://www.sqlite.org/fileformat.html#record_format
[2] https://www.sqlite.org/fileformat2.html#varint
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

R Smith-2

On 2017/12/20 11:01 AM, Dominique Devienne wrote:
> "extremely efficient" is a bit exaggerated IMHO.
>
> More space efficient definitely, see serial types 8 and 8 in [1], so only
> the record's header section need to be consulted,
> w/o the need to decode the whole record or decode the varint [2], thus more
> cpu efficient too (since less to do), but the
> record's page still needs to be paged in/out on reads/writes, and that's
> like to dominate anyway. Just my guess though,

Not to mention that unless you remake the table (which the OP was
insistent to avoid), that INT column will be at the very end of a list
of much bigger columns, which would make reading it the single most
inefficient column to read in the whole table, although the subsequent
index that you are surely to add on that Column will undoubtedly be one
of the fastest, second only to the rowid.

> no hard data to back that up. Perhaps Ryan will measure the difference he
> often does :) --DD

LOL, I would, but a quick glance at the importantometer for this result
revealed that the exercise will be scheduled right between "Some other
time" and "Never".    :)

Cheers!
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Atomic DELETE index optimisation?

Dinu
Thank you all for the replies,
I will hack this problem one way or another after the hoildays and let you
know how it went.
In the mean time, I wish you all happy peaceful holidays, and a great New
Year!

Dinu



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
123