How to find records in a table which are not in a second table?

classic Classic list List threaded Threaded
7 messages Options
Joe
Reply | Threaded
Open this post in threaded view
|

How to find records in a table which are not in a second table?

Joe
A (perhaps silly ) beginners question:
My sqlite database contains several tables, two of them, table A and
table B,  have text colums called 'nam'. The tables have about 2
millions lines.
What's the most efficient way to select all lines from table A with nam
values, which are not present in the nam values of table B?
Thanks --  Joe

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to find records in a table which are not in a second table?

Richard Hipp-3
On 8/28/18, Joe <[hidden email]> wrote:
> A (perhaps silly ) beginners question:
> My sqlite database contains several tables, two of them, table A and
> table B,  have text colums called 'nam'. The tables have about 2
> millions lines.
> What's the most efficient way to select all lines from table A with nam
> values, which are not present in the nam values of table B?

I suppose:

   SELECT * FROM A WHERE nam NOT IN (SELECT nam FROM B);
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to find records in a table which are not in a second table?

Simon Slavin-3
On 28 Aug 2018, at 5:32pm, Richard Hipp <[hidden email]> wrote:

> I suppose:
>
>   SELECT * FROM A WHERE nam NOT IN (SELECT nam FROM B);

Depending on how many names the tables have in column, a possible alternative might be to use the EXCEPT compound operator here.  Something like

    SELECT nam FROM A
        EXCEPT
        SELECT nam FROM B

This will give you just the 'nam' values.  If that's all you need it might be faster than the "NOT IN" version.  If you need more columns than just "nam" then use something like

    SELECT * FROM A WHERE nam IN (
        SELECT nam FROM A
            EXCEPT
            SELECT nam FROM B
    )

But this might be slower than the version Dr Hipp gave.

Make sure you have an index on both A.nam and B.nam to test the fastest these queries can return a result.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to find records in a table which are not in a second table?

Jay Kreibich
In reply to this post by Joe

> On Aug 28, 2018, at 11:30 AM, Joe <[hidden email]> wrote:
>
> A (perhaps silly ) beginners question:
> My sqlite database contains several tables, two of them, table A and table B,  have text colums called 'nam'. The tables have about 2 millions lines.
> What's the most efficient way to select all lines from table A with nam values, which are not present in the nam values of table B?
> Thanks —  Joe

Not sure about performance vs other suggestions, but this is a common way of doing that:

SELECT a.* FROM a LEFT JOIN b USING (nam) WHERE b.nam IS NULL


  -j


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to find records in a table which are not in a second table?

David Raymond
Embarrassing confession time: I didn't think you could use "using" to do this while selecting "a.*"

https://www.sqlite.org/lang_select.html
"For each pair of columns identified by a USING clause, the column from the right-hand dataset is omitted from the joined dataset. This is the only difference between a USING clause and its equivalent ON constraint."

I thought it literally took that column out of the result set. So "a SOME SORT OF JOIN b USING (foo)" literally got rid of a.foo and b.foo and put the value of a.foo into a "foo" column, or literally got rid of b.foo. And that trying to select a.foo, or especially b.foo would raise an error of it not being an existing column.

I didn't realize that the column omission <only> happens with you "select * from" and not for anything else.

The more you know.


SQLite version 3.24.0 2018-06-04 19:24:41
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> create table a (foo, bar);

sqlite> create table b (foo, baz);

sqlite> insert into a values (1, 2), (2, 3), (3, 4);

sqlite> insert into b values (1, 7), (4, 5);

sqlite> select * from a inner join b using (foo);--knew this would only have 3 result columns
foo|bar|baz
1|2|7

sqlite> select a.*, b.* from a inner join b using (foo);--kinda thought this would be an error, or omit foo
foo|bar|foo|baz
1|2|1|7

sqlite> select b.foo from a inner join b using (foo);--definitely thought this was an error
foo
1

sqlite> delete from b;

sqlite> insert into b values (1.0, 7), (4, 5);

sqlite> select * from a inner join b using (foo);
foo|bar|baz
1|2|7

sqlite> select a.*, b.* from a inner join b using (foo);
foo|bar|foo|baz
1|2|1.0|7

sqlite> select foo, a.foo, b.foo, a.bar, b.baz from a inner join b using (foo);
foo|foo|foo|bar|baz
1|1|1.0|2|7

sqlite>


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Jay Kreibich
Sent: Tuesday, August 28, 2018 1:18 PM
To: SQLite mailing list
Subject: Re: [sqlite] How to find records in a table which are not in a second table?


> On Aug 28, 2018, at 11:30 AM, Joe <[hidden email]> wrote:
>
> A (perhaps silly ) beginners question:
> My sqlite database contains several tables, two of them, table A and table B,  have text colums called 'nam'. The tables have about 2 millions lines.
> What's the most efficient way to select all lines from table A with nam values, which are not present in the nam values of table B?
> Thanks —  Joe

Not sure about performance vs other suggestions, but this is a common way of doing that:

SELECT a.* FROM a LEFT JOIN b USING (nam) WHERE b.nam IS NULL


  -j


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to find records in a table which are not in a second table?

R Smith-2
In reply to this post by Jay Kreibich
On 2018/08/28 7:18 PM, Jay Kreibich wrote:
>> On Aug 28, 2018, at 11:30 AM, Joe <[hidden email]> wrote:
>>
>> A (perhaps silly ) beginners question:
>> My sqlite database contains several tables, two of them, table A and table B,  have text colums called 'nam'. The tables have about 2 millions lines.
>> What's the most efficient way to select all lines from table A with nam values, which are not present in the nam values of table B?
>> Thanks —  Joe
> Not sure about performance vs other suggestions, but this is a common way of doing that:
>
> SELECT a.* FROM a LEFT JOIN b USING (nam) WHERE b.nam IS NULL

I have nothing to add in terms of the query - personally I prefer this
version Jay offered (from a point of my internal clarity only).  Simon
or Richard's versions will all work equally well in semantic terms
(perhaps one is slightly better due to specific working of the sqlite
internals - test it!), but in all cases one thing is paramount:  Ensure
you have an Index on nam in at least the b table (but preferably in both).

The efficiency/speed gain in that will vastly overshadow any choice in
lookup method.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to find records in a table which are not in a second table?

Jay Kreibich
In reply to this post by David Raymond

> On Aug 28, 2018, at 1:22 PM, David Raymond <[hidden email]> wrote:
>
> Embarrassing confession time: I didn't think you could use "using" to do this while selecting "a.*"
>
> https://www.sqlite.org/lang_select.html
> "For each pair of columns identified by a USING clause, the column from the right-hand dataset is omitted from the joined dataset. This is the only difference between a USING clause and its equivalent ON constraint."
>
> I thought it literally took that column out of the result set. So "a SOME SORT OF JOIN b USING (foo)" literally got rid of a.foo and b.foo and put the value of a.foo into a "foo" column, or literally got rid of b.foo. And that trying to select a.foo, or especially b.foo would raise an error of it not being an existing column.
>
> I didn't realize that the column omission <only> happens with you "select * from" and not for anything else.
>
> The more you know.


Yeah, this relates to Relational Theory, SQL, and how the two are almost, sorta, kinda, but not-really the same.

One thing to keep in mind is that the source columns and the “working set” of columns (SELECT *) are not the same.  You can always explicitly name a source column.  While USING “collapses” the two columns in the working set (as does a NATURAL JOIN), it does not eliminate the ability to explicitly reference a specific source column using the table notation… that’s why the WHERE clause in my example works as well.  And that’s kind of what the <table>.* notation does, just brings in all the columns from a specific source table.  In a NATURAL JOIN (or a standard [CROSS] JOIN...USING) the two columns will always be the same, so there is little need to explicitly reference a source column.  That’s not true with OUTER JOINs, however, but as shown that can be really useful.

  -j


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users