Quantcast

Deleting records from a large table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Deleting records from a large table

Dave Blake
Say table1 has more then 500000 records, and there is a second table
tmp_keep with the ids of the records in table1 to be kept, the rest need to
be deleted. The number of records in tmp_keep can vary from 0 to all the
records in table1, with any values in between.

What is the best strategy for doing the deletion?

For deleting a large number of records (tmp_keep is small), this works
fine:
DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);

But this becomes inefficient when tmp_keep is large.

Any suggestions?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Deleting records from a large table

R Smith

On 2017/03/03 12:53 PM, Dave Blake wrote:

> Say table1 has more then 500000 records, and there is a second table
> tmp_keep with the ids of the records in table1 to be kept, the rest need to
> be deleted. The number of records in tmp_keep can vary from 0 to all the
> records in table1, with any values in between.
>
> What is the best strategy for doing the deletion?
>
> For deleting a large number of records (tmp_keep is small), this works
> fine:
> DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);
>
> But this becomes inefficient when tmp_keep is large.

BEGIN TRANSACTION;
ALTER TABLE table1 RENAME TO tmp1;
CREATE TABLE table1 (
    -- Your standard Table creation code here for table1...
);
INSERT INTO table1 SELECT tmp1.* FROM tmp1 JOIN tmp_keep ON tmp_keep.id1
= tmp1.id1;
DROP TABLE tmp1;
COMMIT;

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Deleting records from a large table

Clemens Ladisch
In reply to this post by Dave Blake
Dave Blake wrote:
> For deleting a large number of records (tmp_keep is small), this works
> fine:
> DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);
>
> But this becomes inefficient when tmp_keep is large.

SQLite usually creates a temporary index for the values in the IN clause.
How large is "large"?

Are the id1 values integers?  Then you can make tmp_keep.id1 the INTEGER
PRIMARY KEY.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Deleting records from a large table

Dave Blake
Could be keep almost all the records so ~500000, but it varies greatly so
sometimes will be just keep 10. I can adjust approach depending on size if
necessary.

Yes the id1 are integer primary keys.

Table1 has a number of indexes and views, so the create new table approach
is less attractive
‚Äč
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Deleting records from a large table

Keith Medcalf
In reply to this post by Dave Blake

DELETE FROM table1 WHERE NOT EXISTS (SELECT 1 from tmp_keep WHERE id1 = table1.id1);

Does it in a single pass by doing a correlated subquery on each row in table1 to see if the id is in tmp_keep.

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Dave Blake
> Sent: Friday, 3 March, 2017 03:53
> To: SQLite mailing list
> Subject: [sqlite] Deleting records from a large table
>
> Say table1 has more then 500000 records, and there is a second table
> tmp_keep with the ids of the records in table1 to be kept, the rest need
> to
> be deleted. The number of records in tmp_keep can vary from 0 to all the
> records in table1, with any values in between.
>
> What is the best strategy for doing the deletion?
>
> For deleting a large number of records (tmp_keep is small), this works
> fine:
> DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);
>
> But this becomes inefficient when tmp_keep is large.
>
> Any suggestions?
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Deleting records from a large table

Simon Slavin-3
In reply to this post by Dave Blake

On 3 Mar 2017, at 10:53am, Dave Blake <[hidden email]> wrote:

> DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);
>
> But this becomes inefficient when tmp_keep is large.

Do you have an appropriate index on tmp_keep ?

CREATE INDEX tk_id1 ON tmp_keep (id1)

then try it again.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Deleting records from a large table

Clemens Ladisch
In reply to this post by Dave Blake
Dave Blake wrote:
> Yes the id1 are integer primary keys.

In both tables?  If yes, then there is not much you could do, and the
problem probably are all the modifications done to the actual table
and its indexes.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Deleting records from a large table

Dave Blake
Thanks all for your input, it has really helped.

In my real world application tmp_keep is a temporary table populated by
examinining a number of other tables etc., and I suddenly realsied that it
could even contain duplicate ids.  Sloppy thinking on my part.

I get the best results by creating another table:
CREATE TEMPORARY TABLE tmp_keep_unique (id1 integer primary key);
INSERT INTO tmp_keep_unique SELECT DISTINCT id1 from tmp_keep;

It takes far longer to create an index on tmp_keep, than it save times on
the above query with one.

Then
*with a primary key on both table1 and tmp_keep_unique*DELETE FROM table1
WHERE id1 NOT IN (SELECT id1 FROM tmp_keep_unique);

is acceptably efficient.

On SQLite NOT EXISTS is quicker than NOT IN, but I also need to use same
SQL on a MySQL implementation, and it behaves the oppoiste.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...