unique values from a subset of data based on two fields

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

unique values from a subset of data based on two fields

Paul Sanderson
I have a table

Create table names (id int, status int, name text)



1, 1, 'paul'

2, 1,'helen'

3, 0, 'steve'

4, 0, 'steve'

5, 0, 'pete'

6, 0, 'paul'



I want a query that returns all of the records with status = 1 and unique
records, based on name, where the status =0 and the name is not in the list
status=1



So from the above I would want to see



1, 1, paul

2, 1, helen

3, 0, steve (or 4, 0, steve)

5, 0, pete



I could do something like



Select * from names where status = 1 or name not in (select name from names
where status = 1)



But this gets both rows for steve, e.g.



1, 1, paul

2, 1, helen

3, 0, steve

4, 0, steve

5, 0, pete

while I am not bothered about which of the two steves I get back, I must
have all occurences of names with status = 1

I am probably missing somethng obvious

Paul
www.sandersonforensics.com
SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

Michele Pradella
Select DISTINCT name,id,status from names where status = 1



*Michele Pradella*
/R&D Software Engineer
/ [hidden email] <mailto:[hidden email]>
Office: +39 0375 889091

------------------------------------------------------------------------
       
------------------------------------------------------------------------
Selea s.r.l

V. Aldo Moro, 69
46019 Cicognara MN - Italy
Phone: +390375889091
Fax: +390375889080
http://www.selea.com

       

Note: The information contained in this message may be privileged and
confidential and protected from disclosure. If the reader of this
message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient, you
are hereby notified that any dissemination, distribution or copying of
this communication is strictly prohibited. If you have received this
communication in error, please notify us immediately by replying to the
message and deleting it from your computer. Thank you.

Il 29/06/2018 17.50, Paul Sanderson ha scritto:

> I have a table
>
> Create table names (id int, status int, name text)
>
>
>
> 1, 1, 'paul'
>
> 2, 1,'helen'
>
> 3, 0, 'steve'
>
> 4, 0, 'steve'
>
> 5, 0, 'pete'
>
> 6, 0, 'paul'
>
>
>
> I want a query that returns all of the records with status = 1 and unique
> records, based on name, where the status =0 and the name is not in the list
> status=1
>
>
>
> So from the above I would want to see
>
>
>
> 1, 1, paul
>
> 2, 1, helen
>
> 3, 0, steve (or 4, 0, steve)
>
> 5, 0, pete
>
>
>
> I could do something like
>
>
>
> Select * from names where status = 1 or name not in (select name from names
> where status = 1)
>
>
>
> But this gets both rows for steve, e.g.
>
>
>
> 1, 1, paul
>
> 2, 1, helen
>
> 3, 0, steve
>
> 4, 0, steve
>
> 5, 0, pete
>
> while I am not bothered about which of the two steves I get back, I must
> have all occurences of names with status = 1
>
> I am probably missing somethng obvious
>
> Paul
> www.sandersonforensics.com
> SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

David Raymond
In reply to this post by Paul Sanderson
with status_one as (
  select *
  from names
  where status = 1
),
one_names as (
  select distinct name
  from status_one
)
select min(id), status, name
from names
where status = 0
  and name not in one_names
group by status, name

union all

select * from status_one;


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Paul Sanderson
Sent: Friday, June 29, 2018 11:50 AM
To: General Discussion of SQLite Database
Subject: [sqlite] unique values from a subset of data based on two fields

I have a table

Create table names (id int, status int, name text)



1, 1, 'paul'

2, 1,'helen'

3, 0, 'steve'

4, 0, 'steve'

5, 0, 'pete'

6, 0, 'paul'



I want a query that returns all of the records with status = 1 and unique
records, based on name, where the status =0 and the name is not in the list
status=1



So from the above I would want to see



1, 1, paul

2, 1, helen

3, 0, steve (or 4, 0, steve)

5, 0, pete



I could do something like



Select * from names where status = 1 or name not in (select name from names
where status = 1)



But this gets both rows for steve, e.g.



1, 1, paul

2, 1, helen

3, 0, steve

4, 0, steve

5, 0, pete

while I am not bothered about which of the two steves I get back, I must
have all occurences of names with status = 1

I am probably missing somethng obvious

Paul
www.sandersonforensics.com
SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

Paul Sanderson
Thank You David - I was just starting to play with CTEs



Paul
www.sandersonforensics.com
SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>

On 29 June 2018 at 17:45, David Raymond <[hidden email]> wrote:

> with status_one as (
>   select *
>   from names
>   where status = 1
> ),
> one_names as (
>   select distinct name
>   from status_one
> )
> select min(id), status, name
> from names
> where status = 0
>   and name not in one_names
> group by status, name
>
> union all
>
> select * from status_one;
>
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Paul Sanderson
> Sent: Friday, June 29, 2018 11:50 AM
> To: General Discussion of SQLite Database
> Subject: [sqlite] unique values from a subset of data based on two fields
>
> I have a table
>
> Create table names (id int, status int, name text)
>
>
>
> 1, 1, 'paul'
>
> 2, 1,'helen'
>
> 3, 0, 'steve'
>
> 4, 0, 'steve'
>
> 5, 0, 'pete'
>
> 6, 0, 'paul'
>
>
>
> I want a query that returns all of the records with status = 1 and unique
> records, based on name, where the status =0 and the name is not in the list
> status=1
>
>
>
> So from the above I would want to see
>
>
>
> 1, 1, paul
>
> 2, 1, helen
>
> 3, 0, steve (or 4, 0, steve)
>
> 5, 0, pete
>
>
>
> I could do something like
>
>
>
> Select * from names where status = 1 or name not in (select name from names
> where status = 1)
>
>
>
> But this gets both rows for steve, e.g.
>
>
>
> 1, 1, paul
>
> 2, 1, helen
>
> 3, 0, steve
>
> 4, 0, steve
>
> 5, 0, pete
>
> while I am not bothered about which of the two steves I get back, I must
> have all occurences of names with status = 1
>
> I am probably missing somethng obvious
>
> Paul
> www.sandersonforensics.com
> SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

R Smith-2
In reply to this post by David Raymond
The solution from David works perfectly, just want to point out some CTE
things since you mention getting into it.
1st - A nice thing about CTE is that, in the case of a non-recursive CTE
(like this one), it can well be replaced by a simple sub-query, however,
the CTE can be referenced more than once, unlike a sub-query.
2nd - One can even use the CTE in other sub-queries.

With this in mind, here is another option for the query which is
functionally equivalent (i.e. it's not better, simply showing alternate
CTE use):

with status_one as (
   select *
     from names
    where status = 1
)
select min(id), status, name
   from names
  where status = 0 and name not in (select name from status_one)
  group by status, name
union all
select * from status_one
;


CTE capability is one of my favourite additions ever to SQLite (I may
have mentioned this before), so I hope you too find them useful and joyful.

Cheers,
Ryan

On 2018/06/29 6:45 PM, David Raymond wrote:

> with status_one as (
>    select *
>    from names
>    where status = 1
> ),
> one_names as (
>    select distinct name
>    from status_one
> )
> select min(id), status, name
> from names
> where status = 0
>    and name not in one_names
> group by status, name
>
> union all
>
> select * from status_one;
>
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]] On Behalf Of Paul Sanderson
> Sent: Friday, June 29, 2018 11:50 AM
> To: General Discussion of SQLite Database
> Subject: [sqlite] unique values from a subset of data based on two fields
>
> I have a table
>
> Create table names (id int, status int, name text)
>
>
>
> 1, 1, 'paul'
>
> 2, 1,'helen'
>
> 3, 0, 'steve'
>
> 4, 0, 'steve'
>
> 5, 0, 'pete'
>
> 6, 0, 'paul'
>
>
>
> I want a query that returns all of the records with status = 1 and unique
> records, based on name, where the status =0 and the name is not in the list
> status=1
>
>
>
> So from the above I would want to see
>
>
>
> 1, 1, paul
>
> 2, 1, helen
>
> 3, 0, steve (or 4, 0, steve)
>
> 5, 0, pete
>
>
>
> I could do something like
>
>
>
> Select * from names where status = 1 or name not in (select name from names
> where status = 1)
>
>
>
> But this gets both rows for steve, e.g.
>
>
>
> 1, 1, paul
>
> 2, 1, helen
>
> 3, 0, steve
>
> 4, 0, steve
>
> 5, 0, pete
>
> while I am not bothered about which of the two steves I get back, I must
> have all occurences of names with status = 1
>
> I am probably missing somethng obvious
>
> Paul
> www.sandersonforensics.com
> SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

Paul Sanderson
Thanks Ryan

As often is the case the the actual problem is more complex than my example
- sometimes we over simplify to, well, simplify - but you have both given
me some ideas and I'll go away and play.

Paul

Paul
www.sandersonforensics.com
SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>

On 29 June 2018 at 20:24, R Smith <[hidden email]> wrote:

> The solution from David works perfectly, just want to point out some CTE
> things since you mention getting into it.
> 1st - A nice thing about CTE is that, in the case of a non-recursive CTE
> (like this one), it can well be replaced by a simple sub-query, however,
> the CTE can be referenced more than once, unlike a sub-query.
> 2nd - One can even use the CTE in other sub-queries.
>
> With this in mind, here is another option for the query which is
> functionally equivalent (i.e. it's not better, simply showing alternate CTE
> use):
>
> with status_one as (
>   select *
>     from names
>    where status = 1
> )
> select min(id), status, name
>   from names
>  where status = 0 and name not in (select name from status_one)
>  group by status, name
> union all
> select * from status_one
> ;
>
>
> CTE capability is one of my favourite additions ever to SQLite (I may have
> mentioned this before), so I hope you too find them useful and joyful.
>
> Cheers,
> Ryan
>
>
> On 2018/06/29 6:45 PM, David Raymond wrote:
>
>> with status_one as (
>>    select *
>>    from names
>>    where status = 1
>> ),
>> one_names as (
>>    select distinct name
>>    from status_one
>> )
>> select min(id), status, name
>> from names
>> where status = 0
>>    and name not in one_names
>> group by status, name
>>
>> union all
>>
>> select * from status_one;
>>
>>
>> -----Original Message-----
>> From: sqlite-users [mailto:[hidden email]]
>> On Behalf Of Paul Sanderson
>> Sent: Friday, June 29, 2018 11:50 AM
>> To: General Discussion of SQLite Database
>> Subject: [sqlite] unique values from a subset of data based on two fields
>>
>> I have a table
>>
>> Create table names (id int, status int, name text)
>>
>>
>>
>> 1, 1, 'paul'
>>
>> 2, 1,'helen'
>>
>> 3, 0, 'steve'
>>
>> 4, 0, 'steve'
>>
>> 5, 0, 'pete'
>>
>> 6, 0, 'paul'
>>
>>
>>
>> I want a query that returns all of the records with status = 1 and unique
>> records, based on name, where the status =0 and the name is not in the
>> list
>> status=1
>>
>>
>>
>> So from the above I would want to see
>>
>>
>>
>> 1, 1, paul
>>
>> 2, 1, helen
>>
>> 3, 0, steve (or 4, 0, steve)
>>
>> 5, 0, pete
>>
>>
>>
>> I could do something like
>>
>>
>>
>> Select * from names where status = 1 or name not in (select name from
>> names
>> where status = 1)
>>
>>
>>
>> But this gets both rows for steve, e.g.
>>
>>
>>
>> 1, 1, paul
>>
>> 2, 1, helen
>>
>> 3, 0, steve
>>
>> 4, 0, steve
>>
>> 5, 0, pete
>>
>> while I am not bothered about which of the two steves I get back, I must
>> have all occurences of names with status = 1
>>
>> I am probably missing somethng obvious
>>
>> Paul
>> www.sandersonforensics.com
>> SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

Keith Medcalf
In reply to this post by Paul Sanderson
>I want a query that returns all of the records with status = 1 and
>unique records, based on name, where the status =0 and the name is
>not in the list status=1

Translation into SQL using English to SQL Translator, using the most direct translation on the "problem statement" above directly into SQL:

create table names (id int, status int, name text);
insert into names values (1, 1, 'paul');
insert into names values (2, 1, 'helen');
insert into names values (3, 0, 'steve');
insert into names values (4, 0, 'steve');
insert into names values (5, 0, 'pete');
insert into names values (6, 0, 'paul');

-- I want a query that returns all of the records with status = 1

SELECT id,
       status,
       name
  FROM names
 WHERE status == 1

-- and

UNION

-- unique records, based on name, where the status = 0 and the name is not in the list [of names where] status=1

SELECT id,
       status,
       name
  FROM names
 WHERE status == 0
   AND name NOT IN (SELECT name
                      FROM names
                     WHERE status == 1)
GROUP BY name;

Returns the rows:

1|1|paul
2|1|helen
3|0|steve
5|0|pete

If the table is bigger than trivial (ie, contains more than the number of rows you can count with your fingers) then you will need the appropriate indexes to achieve performant results.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Paul Sanderson
>Sent: Friday, 29 June, 2018 09:50
>To: General Discussion of SQLite Database
>Subject: [sqlite] unique values from a subset of data based on two
>fields
>
>I have a table
>
>Create table names (id int, status int, name text)
>
>
>
>1, 1, 'paul'
>
>2, 1,'helen'
>
>3, 0, 'steve'
>
>4, 0, 'steve'
>
>5, 0, 'pete'
>
>6, 0, 'paul'
>
>
>
>I want a query that returns all of the records with status = 1 and
>unique
>records, based on name, where the status =0 and the name is not in
>the list
>status=1
>
>
>
>So from the above I would want to see
>
>
>
>1, 1, paul
>
>2, 1, helen
>
>3, 0, steve (or 4, 0, steve)
>
>5, 0, pete
>
>
>
>I could do something like
>
>
>
>Select * from names where status = 1 or name not in (select name from
>names
>where status = 1)
>
>
>
>But this gets both rows for steve, e.g.
>
>
>
>1, 1, paul
>
>2, 1, helen
>
>3, 0, steve
>
>4, 0, steve
>
>5, 0, pete
>
>while I am not bothered about which of the two steves I get back, I
>must
>have all occurences of names with status = 1
>
>I am probably missing somethng obvious
>
>Paul
>www.sandersonforensics.com
>SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

Paul Sanderson
Easier and pretty obvious :) Thanks Keith



Paul
www.sandersonforensics.com
SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>

On 29 June 2018 at 23:20, Keith Medcalf <[hidden email]> wrote:

> >I want a query that returns all of the records with status = 1 and
> >unique records, based on name, where the status =0 and the name is
> >not in the list status=1
>
> Translation into SQL using English to SQL Translator, using the most
> direct translation on the "problem statement" above directly into SQL:
>
> create table names (id int, status int, name text);
> insert into names values (1, 1, 'paul');
> insert into names values (2, 1, 'helen');
> insert into names values (3, 0, 'steve');
> insert into names values (4, 0, 'steve');
> insert into names values (5, 0, 'pete');
> insert into names values (6, 0, 'paul');
>
> -- I want a query that returns all of the records with status = 1
>
> SELECT id,
>        status,
>        name
>   FROM names
>  WHERE status == 1
>
> -- and
>
> UNION
>
> -- unique records, based on name, where the status = 0 and the name is not
> in the list [of names where] status=1
>
> SELECT id,
>        status,
>        name
>   FROM names
>  WHERE status == 0
>    AND name NOT IN (SELECT name
>                       FROM names
>                      WHERE status == 1)
> GROUP BY name;
>
> Returns the rows:
>
> 1|1|paul
> 2|1|helen
> 3|0|steve
> 5|0|pete
>
> If the table is bigger than trivial (ie, contains more than the number of
> rows you can count with your fingers) then you will need the appropriate
> indexes to achieve performant results.
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >[hidden email]] On Behalf Of Paul Sanderson
> >Sent: Friday, 29 June, 2018 09:50
> >To: General Discussion of SQLite Database
> >Subject: [sqlite] unique values from a subset of data based on two
> >fields
> >
> >I have a table
> >
> >Create table names (id int, status int, name text)
> >
> >
> >
> >1, 1, 'paul'
> >
> >2, 1,'helen'
> >
> >3, 0, 'steve'
> >
> >4, 0, 'steve'
> >
> >5, 0, 'pete'
> >
> >6, 0, 'paul'
> >
> >
> >
> >I want a query that returns all of the records with status = 1 and
> >unique
> >records, based on name, where the status =0 and the name is not in
> >the list
> >status=1
> >
> >
> >
> >So from the above I would want to see
> >
> >
> >
> >1, 1, paul
> >
> >2, 1, helen
> >
> >3, 0, steve (or 4, 0, steve)
> >
> >5, 0, pete
> >
> >
> >
> >I could do something like
> >
> >
> >
> >Select * from names where status = 1 or name not in (select name from
> >names
> >where status = 1)
> >
> >
> >
> >But this gets both rows for steve, e.g.
> >
> >
> >
> >1, 1, paul
> >
> >2, 1, helen
> >
> >3, 0, steve
> >
> >4, 0, steve
> >
> >5, 0, pete
> >
> >while I am not bothered about which of the two steves I get back, I
> >must
> >have all occurences of names with status = 1
> >
> >I am probably missing somethng obvious
> >
> >Paul
> >www.sandersonforensics.com
> >SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
> >_______________________________________________
> >sqlite-users mailing list
> >[hidden email]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: unique values from a subset of data based on two fields

Keith Medcalf

Note that this is SQLite3 specific (and specific to Sybase of the era where Microsoft SQL Server was actually just a rebranded Sybase, and Microsoft re-writes of SQL Server up to about 2000).  Technically you cannot do a query of the form:

SELECT c1, c2
  FROM t1
GROUP BY c2;

because each column in the select list must be either an aggregate or listed in the GROUP BY clause.  SQLite3 allows c1 to be a bare column however and the value returned is taken from "some random row" of the group.  If there are multiple such columns, they all come from the same row in the group.  Although documented as a "random" row of the group, it is the first (or last) row visited in the group while solving the query (and this is of course subject to change but within the same version of SQLite3 will deterministically be the row either first or last in the visitation order -- the actual row may of course change depending on use of indexes, etc).  You can re-write this part so it will work in other SQL dialects that strictly enforce the requirement for c1 to be either an aggregate or listed in the group by clause.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Paul Sanderson
>Sent: Saturday, 30 June, 2018 04:18
>To: SQLite mailing list
>Subject: Re: [sqlite] unique values from a subset of data based on
>two fields
>
>Easier and pretty obvious :) Thanks Keith
>
>
>
>Paul
>www.sandersonforensics.com
>SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
>
>On 29 June 2018 at 23:20, Keith Medcalf <[hidden email]> wrote:
>
>> >I want a query that returns all of the records with status = 1 and
>> >unique records, based on name, where the status =0 and the name is
>> >not in the list status=1
>>
>> Translation into SQL using English to SQL Translator, using the
>most
>> direct translation on the "problem statement" above directly into
>SQL:
>>
>> create table names (id int, status int, name text);
>> insert into names values (1, 1, 'paul');
>> insert into names values (2, 1, 'helen');
>> insert into names values (3, 0, 'steve');
>> insert into names values (4, 0, 'steve');
>> insert into names values (5, 0, 'pete');
>> insert into names values (6, 0, 'paul');
>>
>> -- I want a query that returns all of the records with status = 1
>>
>> SELECT id,
>>        status,
>>        name
>>   FROM names
>>  WHERE status == 1
>>
>> -- and
>>
>> UNION
>>
>> -- unique records, based on name, where the status = 0 and the name
>is not
>> in the list [of names where] status=1
>>
>> SELECT id,
>>        status,
>>        name
>>   FROM names
>>  WHERE status == 0
>>    AND name NOT IN (SELECT name
>>                       FROM names
>>                      WHERE status == 1)
>> GROUP BY name;
>>
>> Returns the rows:
>>
>> 1|1|paul
>> 2|1|helen
>> 3|0|steve
>> 5|0|pete
>>
>> If the table is bigger than trivial (ie, contains more than the
>number of
>> rows you can count with your fingers) then you will need the
>appropriate
>> indexes to achieve performant results.
>>
>> ---
>> The fact that there's a Highway to Hell but only a Stairway to
>Heaven says
>> a lot about anticipated traffic volume.
>>
>>
>> >-----Original Message-----
>> >From: sqlite-users [mailto:sqlite-users-
>> >[hidden email]] On Behalf Of Paul Sanderson
>> >Sent: Friday, 29 June, 2018 09:50
>> >To: General Discussion of SQLite Database
>> >Subject: [sqlite] unique values from a subset of data based on two
>> >fields
>> >
>> >I have a table
>> >
>> >Create table names (id int, status int, name text)
>> >
>> >
>> >
>> >1, 1, 'paul'
>> >
>> >2, 1,'helen'
>> >
>> >3, 0, 'steve'
>> >
>> >4, 0, 'steve'
>> >
>> >5, 0, 'pete'
>> >
>> >6, 0, 'paul'
>> >
>> >
>> >
>> >I want a query that returns all of the records with status = 1 and
>> >unique
>> >records, based on name, where the status =0 and the name is not in
>> >the list
>> >status=1
>> >
>> >
>> >
>> >So from the above I would want to see
>> >
>> >
>> >
>> >1, 1, paul
>> >
>> >2, 1, helen
>> >
>> >3, 0, steve (or 4, 0, steve)
>> >
>> >5, 0, pete
>> >
>> >
>> >
>> >I could do something like
>> >
>> >
>> >
>> >Select * from names where status = 1 or name not in (select name
>from
>> >names
>> >where status = 1)
>> >
>> >
>> >
>> >But this gets both rows for steve, e.g.
>> >
>> >
>> >
>> >1, 1, paul
>> >
>> >2, 1, helen
>> >
>> >3, 0, steve
>> >
>> >4, 0, steve
>> >
>> >5, 0, pete
>> >
>> >while I am not bothered about which of the two steves I get back,
>I
>> >must
>> >have all occurences of names with status = 1
>> >
>> >I am probably missing somethng obvious
>> >
>> >Paul
>> >www.sandersonforensics.com
>> >SQLite Forensics Book
><https://www.amazon.co.uk/dp/ASIN/1980293074>
>> >_______________________________________________
>> >sqlite-users mailing list
>> >[hidden email]
>> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>>
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

GROUP BY [Re: unique values from a subset of data based on two fields]

Luuk

On 30-6-2018 14:55, Keith Medcalf wrote:

> Note that this is SQLite3 specific (and specific to Sybase of the era where Microsoft SQL Server was actually just a rebranded Sybase, and Microsoft re-writes of SQL Server up to about 2000).  Technically you cannot do a query of the form:
>
> SELECT c1, c2
>   FROM t1
> GROUP BY c2;
>
> because each column in the select list must be either an aggregate or listed in the GROUP BY clause.  SQLite3 allows c1 to be a bare column however and the value returned is taken from "some random row" of the group.  If there are multiple such columns, they all come from the same row in the group.  Although documented as a "random" row of the group, it is the first (or last) row visited in the group while solving the query (and this is of course subject to change but within the same version of SQLite3 will deterministically be the row either first or last in the visitation order -- the actual row may of course change depending on use of indexes, etc).  You can re-write this part so it will work in other SQL dialects that strictly enforce the requirement for c1 to be either an aggregate or listed in the group by clause.
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
>
Ok ,my highway to hell start here (regargind the use of SQL)

In SQLite3 you are allowed to do this:
SELECT a,b,c
FROM t1
GROUP BY a

The values of 'b' and 'c' will be taken from a 'random' row...

But if we rewrite this in SQL, i am getting something like this:
SELECT
   a,
   (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
   (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
FROM t1 t
GROUP BY a

QUESTION: How does one get the proper relationship between 'B' and 'C'?,
i mean how can one be use that both values are from the same row?
This is not a problem to SQLite, because in SQLite the values of b and c
seems to be originating from the same row, but what about *SQL* (if that
exists...?)

--
some test results:
sqlite> insert into t1 values (1,1,2);
sqlite> insert into t1 values (1,2,1);
sqlite> insert into t1 values (2,2,1);
sqlite> insert into t1 values (2,1,2);
sqlite> select a,b,c from t1 group by a;
1|2|1
2|1|2
sqlite> SELECT
   ...>    a,
   ...>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
   ...>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
   ...> FROM t1 t
   ...> GROUP BY a;
1|1|1
2|1|1
sqlite>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: GROUP BY [Re: unique values from a subset of data based on two fields]

Abroży Nieprzełoży
> SELECT
>    a,
>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
> FROM t1 t
> GROUP BY a

Why not
    select a, min(b) as b, min(c) as c from t1 group by a;
?


2018-06-30 15:12 GMT+02:00, Luuk <[hidden email]>:

>
> On 30-6-2018 14:55, Keith Medcalf wrote:
>> Note that this is SQLite3 specific (and specific to Sybase of the era
>> where Microsoft SQL Server was actually just a rebranded Sybase, and
>> Microsoft re-writes of SQL Server up to about 2000).  Technically you
>> cannot do a query of the form:
>>
>> SELECT c1, c2
>>   FROM t1
>> GROUP BY c2;
>>
>> because each column in the select list must be either an aggregate or
>> listed in the GROUP BY clause.  SQLite3 allows c1 to be a bare column
>> however and the value returned is taken from "some random row" of the
>> group.  If there are multiple such columns, they all come from the same
>> row in the group.  Although documented as a "random" row of the group, it
>> is the first (or last) row visited in the group while solving the query
>> (and this is of course subject to change but within the same version of
>> SQLite3 will deterministically be the row either first or last in the
>> visitation order -- the actual row may of course change depending on use
>> of indexes, etc).  You can re-write this part so it will work in other SQL
>> dialects that strictly enforce the requirement for c1 to be either an
>> aggregate or listed in the group by clause.
>>
>> ---
>> The fact that there's a Highway to Hell but only a Stairway to Heaven says
>> a lot about anticipated traffic volume.
>>
> Ok ,my highway to hell start here (regargind the use of SQL)
>
> In SQLite3 you are allowed to do this:
> SELECT a,b,c
> FROM t1
> GROUP BY a
>
> The values of 'b' and 'c' will be taken from a 'random' row...
>
> But if we rewrite this in SQL, i am getting something like this:
> SELECT
>    a,
>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
> FROM t1 t
> GROUP BY a
>
> QUESTION: How does one get the proper relationship between 'B' and 'C'?,
> i mean how can one be use that both values are from the same row?
> This is not a problem to SQLite, because in SQLite the values of b and c
> seems to be originating from the same row, but what about *SQL* (if that
> exists...?)
>
> --
> some test results:
> sqlite> insert into t1 values (1,1,2);
> sqlite> insert into t1 values (1,2,1);
> sqlite> insert into t1 values (2,2,1);
> sqlite> insert into t1 values (2,1,2);
> sqlite> select a,b,c from t1 group by a;
> 1|2|1
> 2|1|2
> sqlite> SELECT
>    ...>    a,
>    ...>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>    ...>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>    ...> FROM t1 t
>    ...> GROUP BY a;
> 1|1|1
> 2|1|1
> sqlite>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: GROUP BY [Re: unique values from a subset of data based on two fields]

Luuk


On 30-6-2018 15:39, Abroży Nieprzełoży wrote:
>> SELECT
>>    a,
>>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>> FROM t1 t
>> GROUP BY a
> Why not
>     select a, min(b) as b, min(c) as c from t1 group by a;
> ?

It still does not quarantee that the valuse show for b and c are comming
from the same row...


>
> 2018-06-30 15:12 GMT+02:00, Luuk <[hidden email]>:
>> On 30-6-2018 14:55, Keith Medcalf wrote:
>>> Note that this is SQLite3 specific (and specific to Sybase of the era
>>> where Microsoft SQL Server was actually just a rebranded Sybase, and
>>> Microsoft re-writes of SQL Server up to about 2000).  Technically you
>>> cannot do a query of the form:
>>>
>>> SELECT c1, c2
>>>   FROM t1
>>> GROUP BY c2;
>>>
>>> because each column in the select list must be either an aggregate or
>>> listed in the GROUP BY clause.  SQLite3 allows c1 to be a bare column
>>> however and the value returned is taken from "some random row" of the
>>> group.  If there are multiple such columns, they all come from the same
>>> row in the group.  Although documented as a "random" row of the group, it
>>> is the first (or last) row visited in the group while solving the query
>>> (and this is of course subject to change but within the same version of
>>> SQLite3 will deterministically be the row either first or last in the
>>> visitation order -- the actual row may of course change depending on use
>>> of indexes, etc).  You can re-write this part so it will work in other SQL
>>> dialects that strictly enforce the requirement for c1 to be either an
>>> aggregate or listed in the group by clause.
>>>
>>> ---
>>> The fact that there's a Highway to Hell but only a Stairway to Heaven says
>>> a lot about anticipated traffic volume.
>>>
>> Ok ,my highway to hell start here (regargind the use of SQL)
>>
>> In SQLite3 you are allowed to do this:
>> SELECT a,b,c
>> FROM t1
>> GROUP BY a
>>
>> The values of 'b' and 'c' will be taken from a 'random' row...
>>
>> But if we rewrite this in SQL, i am getting something like this:
>> SELECT
>>    a,
>>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>> FROM t1 t
>> GROUP BY a
>>
>> QUESTION: How does one get the proper relationship between 'B' and 'C'?,
>> i mean how can one be use that both values are from the same row?
>> This is not a problem to SQLite, because in SQLite the values of b and c
>> seems to be originating from the same row, but what about *SQL* (if that
>> exists...?)
>>
>> --
>> some test results:
>> sqlite> insert into t1 values (1,1,2);
>> sqlite> insert into t1 values (1,2,1);
>> sqlite> insert into t1 values (2,2,1);
>> sqlite> insert into t1 values (2,1,2);
>> sqlite> select a,b,c from t1 group by a;
>> 1|2|1
>> 2|1|2
>> sqlite> SELECT
>>    ...>    a,
>>    ...>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>>    ...>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>>    ...> FROM t1 t
>>    ...> GROUP BY a;
>> 1|1|1
>> 2|1|1
>> sqlite>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: GROUP BY [Re: unique values from a subset of data based on two fields]

Abroży Nieprzełoży
>>> SELECT
>>>    a,
>>>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>>>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>>> FROM t1 t
>>> GROUP BY a
>> Why not
>>     select a, min(b) as b, min(c) as c from t1 group by a;
>> ?
>
> It still does not quarantee that the valuse show for b and c are comming
> from the same row...

select a, b, min(c) as c from (select a, min(b) as b from t1 group by
a) join t1 using(a, b) group by a, b;

?

2018-06-30 15:45 GMT+02:00, Luuk <[hidden email]>:

>
>
> On 30-6-2018 15:39, Abroży Nieprzełoży wrote:
>>> SELECT
>>>    a,
>>>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>>>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>>> FROM t1 t
>>> GROUP BY a
>> Why not
>>     select a, min(b) as b, min(c) as c from t1 group by a;
>> ?
>
> It still does not quarantee that the valuse show for b and c are comming
> from the same row...
>
>
>>
>> 2018-06-30 15:12 GMT+02:00, Luuk <[hidden email]>:
>>> On 30-6-2018 14:55, Keith Medcalf wrote:
>>>> Note that this is SQLite3 specific (and specific to Sybase of the era
>>>> where Microsoft SQL Server was actually just a rebranded Sybase, and
>>>> Microsoft re-writes of SQL Server up to about 2000).  Technically you
>>>> cannot do a query of the form:
>>>>
>>>> SELECT c1, c2
>>>>   FROM t1
>>>> GROUP BY c2;
>>>>
>>>> because each column in the select list must be either an aggregate or
>>>> listed in the GROUP BY clause.  SQLite3 allows c1 to be a bare column
>>>> however and the value returned is taken from "some random row" of the
>>>> group.  If there are multiple such columns, they all come from the same
>>>> row in the group.  Although documented as a "random" row of the group,
>>>> it
>>>> is the first (or last) row visited in the group while solving the query
>>>> (and this is of course subject to change but within the same version of
>>>> SQLite3 will deterministically be the row either first or last in the
>>>> visitation order -- the actual row may of course change depending on use
>>>> of indexes, etc).  You can re-write this part so it will work in other
>>>> SQL
>>>> dialects that strictly enforce the requirement for c1 to be either an
>>>> aggregate or listed in the group by clause.
>>>>
>>>> ---
>>>> The fact that there's a Highway to Hell but only a Stairway to Heaven
>>>> says
>>>> a lot about anticipated traffic volume.
>>>>
>>> Ok ,my highway to hell start here (regargind the use of SQL)
>>>
>>> In SQLite3 you are allowed to do this:
>>> SELECT a,b,c
>>> FROM t1
>>> GROUP BY a
>>>
>>> The values of 'b' and 'c' will be taken from a 'random' row...
>>>
>>> But if we rewrite this in SQL, i am getting something like this:
>>> SELECT
>>>    a,
>>>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>>>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>>> FROM t1 t
>>> GROUP BY a
>>>
>>> QUESTION: How does one get the proper relationship between 'B' and 'C'?,
>>> i mean how can one be use that both values are from the same row?
>>> This is not a problem to SQLite, because in SQLite the values of b and c
>>> seems to be originating from the same row, but what about *SQL* (if that
>>> exists...?)
>>>
>>> --
>>> some test results:
>>> sqlite> insert into t1 values (1,1,2);
>>> sqlite> insert into t1 values (1,2,1);
>>> sqlite> insert into t1 values (2,2,1);
>>> sqlite> insert into t1 values (2,1,2);
>>> sqlite> select a,b,c from t1 group by a;
>>> 1|2|1
>>> 2|1|2
>>> sqlite> SELECT
>>>    ...>    a,
>>>    ...>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>>>    ...>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
>>>    ...> FROM t1 t
>>>    ...> GROUP BY a;
>>> 1|1|1
>>> 2|1|1
>>> sqlite>
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: GROUP BY [Re: unique values from a subset of data based on two fields]

Gerry Snyder-4
In reply to this post by Luuk
If you want the row with the minimum B, and the row with a minimum C, then
the union of two queries would seem to be appropriate.

Gerry Snyder

On Sat, Jun 30, 2018, 6:45 AM Luuk <[hidden email]> wrote:

>
>
> On 30-6-2018 15:39, Abroży Nieprzełoży wrote:
> >> SELECT
> >>    a,
> >>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
> >>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
> >> FROM t1 t
> >> GROUP BY a
> > Why not
> >     select a, min(b) as b, min(c) as c from t1 group by a;
> > ?
>
> It still does not quarantee that the valuse show for b and c are comming
> from the same row...
>
>
> >
> > 2018-06-30 15:12 GMT+02:00, Luuk <[hidden email]>:
> >> On 30-6-2018 14:55, Keith Medcalf wrote:
> >>> Note that this is SQLite3 specific (and specific to Sybase of the era
> >>> where Microsoft SQL Server was actually just a rebranded Sybase, and
> >>> Microsoft re-writes of SQL Server up to about 2000).  Technically you
> >>> cannot do a query of the form:
> >>>
> >>> SELECT c1, c2
> >>>   FROM t1
> >>> GROUP BY c2;
> >>>
> >>> because each column in the select list must be either an aggregate or
> >>> listed in the GROUP BY clause.  SQLite3 allows c1 to be a bare column
> >>> however and the value returned is taken from "some random row" of the
> >>> group.  If there are multiple such columns, they all come from the same
> >>> row in the group.  Although documented as a "random" row of the group,
> it
> >>> is the first (or last) row visited in the group while solving the query
> >>> (and this is of course subject to change but within the same version of
> >>> SQLite3 will deterministically be the row either first or last in the
> >>> visitation order -- the actual row may of course change depending on
> use
> >>> of indexes, etc).  You can re-write this part so it will work in other
> SQL
> >>> dialects that strictly enforce the requirement for c1 to be either an
> >>> aggregate or listed in the group by clause.
> >>>
> >>> ---
> >>> The fact that there's a Highway to Hell but only a Stairway to Heaven
> says
> >>> a lot about anticipated traffic volume.
> >>>
> >> Ok ,my highway to hell start here (regargind the use of SQL)
> >>
> >> In SQLite3 you are allowed to do this:
> >> SELECT a,b,c
> >> FROM t1
> >> GROUP BY a
> >>
> >> The values of 'b' and 'c' will be taken from a 'random' row...
> >>
> >> But if we rewrite this in SQL, i am getting something like this:
> >> SELECT
> >>    a,
> >>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
> >>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
> >> FROM t1 t
> >> GROUP BY a
> >>
> >> QUESTION: How does one get the proper relationship between 'B' and 'C'?,
> >> i mean how can one be use that both values are from the same row?
> >> This is not a problem to SQLite, because in SQLite the values of b and c
> >> seems to be originating from the same row, but what about *SQL* (if that
> >> exists...?)
> >>
> >> --
> >> some test results:
> >> sqlite> insert into t1 values (1,1,2);
> >> sqlite> insert into t1 values (1,2,1);
> >> sqlite> insert into t1 values (2,2,1);
> >> sqlite> insert into t1 values (2,1,2);
> >> sqlite> select a,b,c from t1 group by a;
> >> 1|2|1
> >> 2|1|2
> >> sqlite> SELECT
> >>    ...>    a,
> >>    ...>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
> >>    ...>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
> >>    ...> FROM t1 t
> >>    ...> GROUP BY a;
> >> 1|1|1
> >> 2|1|1
> >> sqlite>
> >>
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: GROUP BY [Re: unique values from a subset of data based on two fields]

Luuk
In reply to this post by Luuk


On 30-6-2018 15:45, Luuk wrote:

>
> In SQLite3 you are allowed to do this:
> SELECT a,b,c
> FROM t1
> GROUP BY a
>
> The values of 'b' and 'c' will be taken from a 'random' row...
>
> But if we rewrite this in SQL, i am getting something like this:
> SELECT
>    a,
>    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
> FROM t1 t
> GROUP BY a
>
> QUESTION: How does one get the proper relationship between 'B' and 'C'?,
> i mean how can one be use that both values are from the same row?
> This is not a problem to SQLite, because in SQLite the values of b and c
> seems to be originating from the same row, but what about *SQL* (if that
> exists...?)
>

I think i'll do this:

select x.a, t1.b, t1.c
from (select t1.a, min(t1.rowid)
    from t1
    group by t1.a) x
inner join t1 on x.rowid=t1.rowid;

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: GROUP BY [Re: unique values from a subset of data based on two fields]

R Smith-2
In reply to this post by Luuk
On 2018/06/30 3:12 PM, Luuk wrote:

> Ok ,my highway to hell start here (regargind the use of SQL)
>
> In SQLite3 you are allowed to do this:
> SELECT a,b,c
> FROM t1
> GROUP BY a
>
> The values of 'b' and 'c' will be taken from a 'random' row...
>
> But if we rewrite this in SQL, i am getting something like this:
> SELECT
>     a,
>     (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
>     (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
> FROM t1 t
> GROUP BY a
>
> QUESTION: How does one get the proper relationship between 'B' and 'C'?,

This question comes up from time to time, and is sometimes misunderstood
by SQL users everywhere (or from every Engine I should say).

There is no easy solution and it isn't really an aggregate Set-Algebra
problem, so mostly not handled specifically by Engines for aggregate
queries.

To just state the problem more fully:

Consider this Table "Fruit" in a Grocer's shop:
SELECT * FROM Fruit;
   --         |              |     |  DaysOn-
   -- Name    |   BasketNo   | Qty |   Shelf
   -- ------- | ------------ | --- | ---------
   -- Apples  |       1      |  40 |     12
   -- Apples  |       2      |  40 |     14
   -- Apples  |       3      |  23 |     16
   -- Oranges |       4      |  40 |     12
   -- Oranges |       5      |  40 |     14
   -- Oranges |       6      |  11 |     16
   -- Apples  |       7      |  7  |     22

An unopened basket contains 40 fruit. We try to use the oldest first, etc.
Now suppose we want to know what is the oldest unopened basket for every
kind of fruit. The simplest query in *SQLite* would be:

SELECT Name, MAX(DaysOnShelf) AS MaxAge, BasketNo
   FROM Fruit
  WHERE Qty = 40
  GROUP BY Name
;

   -- Name    | MaxAge |   BasketNo
   -- ------- | ------ | ------------
   -- Apples  |   14   |       2
   -- Oranges |   14   |       5

Note that this ONLY works due to a peculiarity in SQLite, and it is
forewarned in the SQLite documents that the non-aggregate field could
return ANY random value that qualifies to be in a selected row (i.e. any
row that is chosen for aggregation by the WHERE clause filter).

To say exactly what that means: In the query, for the Apples aggregate,
two rows "pass" the filter where Qty = 40, namely those with baskets 1
and 2. MAX(DaysOnShelf) for those two rows correctly returns 14, but
there is no guarantee that the returned BasketNo will be from THAT
specific row, it could have returned (still mathematically correct)
Basket 1 in stead of Basket 2.  And, in a next release, it might even do
so. You cannot assume what it would be.

In fact, if we change the query to enclose BasketNo too in an aggregate
function (MIN() in this case):
SELECT Name, MAX(DaysOnShelf) AS MaxAge, MIN(BasketNo) AS BasketNo
   FROM Fruit
  WHERE Qty = 40
  GROUP BY Name
;

   -- Name    | MaxAge |   BasketNo
   -- ------- | ------ | ------------
   -- Apples  |   14   |       1
   -- Oranges |   14   |       4

Now the 14 and 1 next to Apples are definitely NOT from the same row -
which is the entire point of aggregate functions, we want to know the
truth over the whole set for each function, it is not intended to pick
out values.

Thing is, most developers are precisely interested in the specific row
containing the qualifying aggregate.

There are two ways to solve this (using standard SQL in any SQL engine -
there might be more ways in specific Engines, such as SQLite).

1 - First find the target aggregate, then look that up against the
original (non aggregate) set of records.
2 - Construct a custom taxonomy and deconstruct after.

1 is by far the most used and the easiest to write. It's downside is
that it requires a double look-up loop, which in most Engines is fairly
efficient anyway.
An example of this:

SELECT X.Name, X.MaxAge, F.BasketNo
   FROM (
     SELECT Name, MAX(DaysOnShelf) AS MaxAge, MAX(Qty) AS Qty
       FROM Fruit
      WHERE Qty = 40
      GROUP BY Name
   ) AS X
   JOIN Fruit AS F ON F.Name = X.Name AND F.DaysOnShelf = X.MaxAge AND
F.Qty = X.Qty
;
   -- Name    | MaxAge |   BasketNo
   -- ------- | ------ | ------------
   -- Apples  |   14   |       2
   -- Oranges |   14   |       5


-- The above example shows ALL rows that match (as is the nature of a
JOIN).  To only show a single row, the following can work:

SELECT X.Name, X.MaxAge, (
        SELECT F.BasketNo FROM Fruit AS F WHERE F.Name = X.Name AND
F.DaysOnShelf = X.MaxAge AND F.Qty = X.Qty LIMIT 1
        ) AS BasketNo
   FROM (
     SELECT Name, MAX(DaysOnShelf) AS MaxAge, MAX(Qty) AS Qty
       FROM Fruit
      WHERE Qty = 40
      GROUP BY Name
   ) AS X
;
   -- Name    | MaxAge |   BasketNo
   -- ------- | ------ | ------------
   -- Apples  |   14   |       2
   -- Oranges |   14   |       5


-- And lastly, an example of achieving the first query with a CTE:

WITH X(Name, MaxAge, Qty) AS (
     SELECT Name, MAX(DaysOnShelf), MAX(Qty)
       FROM Fruit
      WHERE Qty = 40
      GROUP BY Name
)
SELECT X.Name, X.MaxAge, F.BasketNo
   FROM X
   JOIN Fruit AS F ON F.Name = X.Name AND F.DaysOnShelf = X.MaxAge AND
F.Qty = X.Qty
;
   -- Name    | MaxAge |   BasketNo
   -- ------- | ------ | ------------
   -- Apples  |   14   |       2
   -- Oranges |   14   |       5



The second way to do it is the custom taxonomy. This simply means we
fuse together some fields so they are treated as if they are 1 unit,
with the most significant field first. If the Engine sees them as 1
value, they must always be regarded together, and the aggregate function
typically acts upon the most significant part of the value[1]. After we
get the aggregate returned, we can take them apart again and have the
details we want.
This is rather easy with TEXT values, so I'm going to use the same
example as above to show how it can also be done with Numeric values.

The only real advantage of this method is that it requires only a single
lookup loop, though it does come with some string-handling overhead - as
always, it's best to test the speed with the target DB.

This first example shows only half the solution for clarity (I am not
taking it apart yet) so that you can see what the fused-together values
look like[2]. We simply make the Age value into a known-length number by
adding it (I chose 1000 since fruit older than 1000 days is, well,
unlikely, but you can use much larger values) and then find the MAX()
for it:

SELECT X.Name, X.MaxAge_Basket
   FROM (
     SELECT Name, MAX((1000+DaysOnShelf)||':'||BasketNo) AS MaxAge_Basket
       FROM Fruit
      WHERE Qty = 40
      GROUP BY Name
   ) AS X
;
   --         | MaxAge_B-
   -- Name    | asket
   -- ------- | ---------
   -- Apples  | 1014:2
   -- Oranges | 1014:5

[1] "Fusing-together"typically results in a string, though you can do a
value-based taxonomy. An example of it is referring to Months by their
(Year*100+Month) value, so that September 2017 is 201709 and 202012 is
December 2020. Aggregate functions on strings consider the most
significant part always to the left. i.e. "012" > "0099999999999" and if
you want to compare numbers when they are in strings, you have to make
sure the left leading length is similar: "00012" < "00014", etc.

[2] The colon is not technically needed, it's just added here for clarity.


And finally here taking that fused-together bit apart (and subtracting
the 1000 we added) to show the two actual values. This guarantees that
the shown Basket belongs to that specific MaxAge:

SELECT X.Name, CAST(substr(X.MaxAge_Basket,1,4) AS INT)-1000 AS MaxAge,
substr(X.MaxAge_Basket,6) AS Basket
   FROM (
     SELECT Name, MAX((1000+DaysOnShelf)||':'||BasketNo) AS MaxAge_Basket
       FROM Fruit
      WHERE Qty = 40
      GROUP BY Name
   ) AS X
;
   -- Name    | MaxAge | Basket
   -- ------- | ------ | ------
   -- Apples  |   14   |   2
   -- Oranges |   14   |   5


This last example achieves the same using an Integer taxonomy:

SELECT X.Name, X.MaxAge_Basket / 100 AS MaxAge, X.MaxAge_Basket % 100 AS
Basket
   FROM (
     SELECT Name, 100 * DaysOnShelf + BasketNo AS MaxAge_Basket
       FROM Fruit
      WHERE Qty = 40
      GROUP BY Name
   ) AS X
;
   -- Name    | MaxAge |    Basket
   -- ------- | ------ | ------------
   -- Apples  |   14   |       2
   -- Oranges |   14   |       5


These work in any Engine.

Hope that all makes sense!
Ryan







_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users