Performance problem with DELETE FROM/correlated subqueries

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance problem with DELETE FROM/correlated subqueries

Jürgen Baier
Hi,

I have a question regarding the performance of DELETE FROM (or maybe
better: correlated subqueries).

I have a table "main" and a table "staging". In "staging" I have a
subset of "main". I want to delete all rows from "main" which are also
in "staging".

   CREATE TABLE main ( ATT1 INT, ATT2 INT, PRIMARY KEY (ATT1,ATT2) );
   CREATE TABLE staging ( ATT1 INT, ATT2 INT );

Then I execute

   DELETE FROM main WHERE EXISTS (SELECT 1 FROM staging WHERE main.att1
= staging.att1 AND main.att2 = staging.att2)

which takes a very long time. As far as I understand the query plan
SQLite scans the full staging table for each row in "main":

   sqlite> EXPLAIN QUERY PLAN DELETE FROM main WHERE EXISTS (SELECT 1
FROM staging WHERE main.att1 = staging.att1 AND main.att2 = staging.att2)
      ...> ;
   QUERY PLAN
   |--SCAN TABLE main
   `--CORRELATED SCALAR SUBQUERY
      `--SCAN TABLE staging

How do I speed this up? The idea is that the database should scan
"staging" and lookup "main" (because an appropriate primary index exists).

But I'm open to any alternative approach. I just have the situation that
I have a very large "main" table and a "staging" table which contains
also a large number of tuples which should be deleted from "main".

Any ideas?

Thanks,

Jürgen

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance problem with DELETE FROM/correlated subqueries

Clemens Ladisch
Jürgen Baier wrote:
>   CREATE TABLE main ( ATT1 INT, ATT2 INT, PRIMARY KEY (ATT1,ATT2) );
>   CREATE TABLE staging ( ATT1 INT, ATT2 INT );
>
> Then I execute
>
>   DELETE FROM main WHERE EXISTS (SELECT 1 FROM staging WHERE main.att1 = staging.att1 AND main.att2 = staging.att2)
>
> which takes a very long time.

DELETE FROM main WHERE (att1, att2) IN (SELECT att1, att2 FROM staging);


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance problem with DELETE FROM/correlated subqueries

Rowan Worth-2
On Fri, 7 Feb 2020 at 16:25, Clemens Ladisch <[hidden email]> wrote:

> Jürgen Baier wrote:
> >   CREATE TABLE main ( ATT1 INT, ATT2 INT, PRIMARY KEY (ATT1,ATT2) );
> >   CREATE TABLE staging ( ATT1 INT, ATT2 INT );
> >
> > Then I execute
> >
> >   DELETE FROM main WHERE EXISTS (SELECT 1 FROM staging WHERE main.att1 =
> staging.att1 AND main.att2 = staging.att2)
> >
> > which takes a very long time.
>
> DELETE FROM main WHERE (att1, att2) IN (SELECT att1, att2 FROM staging);
>

Note using row-values requires sqlite 3.15.0 or later -- which is three
years old at this point, but every version I have on hand still says
'Error: near ",": syntax error' so I thought I'd track down the details :)
-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance problem with DELETE FROM/correlated subqueries

Jürgen Baier
In reply to this post by Clemens Ladisch
Hi,

On 07.02.20 09:25, Clemens Ladisch wrote:

> Jürgen Baier wrote:
>>    CREATE TABLE main ( ATT1 INT, ATT2 INT, PRIMARY KEY (ATT1,ATT2) );
>>    CREATE TABLE staging ( ATT1 INT, ATT2 INT );
>>
>> Then I execute
>>
>>    DELETE FROM main WHERE EXISTS (SELECT 1 FROM staging WHERE main.att1 = staging.att1 AND main.att2 = staging.att2)
>>
>> which takes a very long time.
>
> DELETE FROM main WHERE (att1, att2) IN (SELECT att1, att2 FROM staging);

Thank you very much.

I can confirm that this solves my problem and indeed scans the staging
table and looks up the main table:

sqlite> EXPLAIN QUERY PLAN DELETE FROM main WHERE (att1, att2) IN
(SELECT att1, att2 FROM staging);
QUERY PLAN
|--SEARCH TABLE main USING INDEX sqlite_autoindex_main_1 (ATT1=? AND ATT2=?)
`--LIST SUBQUERY
    `--SCAN TABLE staging

For reference: This syntax is not supported by Microsoft SQL Server
(2017). But Microsoft SQL Server is relatively fast when using the
original DELETE FROM query.

Thanks,

Jürgen

>
>
> Regards,
> Clemens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users