Confusion about DISTINCT keyword

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Confusion about DISTINCT keyword

Balaji Ramanathan
I have the query below with one column in the final output coming from the
main query and two columns in the final output coming from correlated
subqueries:

SELECT DISTINCT *modenumber*,


(SELECT tripid

FROM trip

WHERE modenumber = T.modenumber

ORDER BY distance DESC

LIMIT 3

) AS *tripid*,


(SELECT distance

FROM trip

WHERE modenumber = T.modenumber

ORDER BY distance DESC

LIMIT 3

) AS *distance*


FROM trip T

ORDER BY modenumber;


I expected this query to produce at least 3x the number of distinct
modenumbers because the results of the subqueries (especially the tripid)
are different for each trip.  However, when I run it, I get only one row
per modenumber.


It is as if the DISTINCT keyword is applying only to the modenumber, and
not to the other two columns in the select.  Is that because the other two
columns are coming from subqueries?  The documentation does not explicitly
mention subqueries, and only says distinct removes duplicate rows.  In this
case, it seems to think that the row consists of just the column from the
main query and does not include the columns from the subqueries.


Is this a bug or is this the expected behavior of the DISTINCT keyword?
Thank you.


Balaji Ramanathan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Confusion about DISTINCT keyword

Keith Medcalf

Neither.  It has nothing to do with the DISTINCT keyword, which causes only DISTINCT rows to be returned (duplicates are removed).

You misunderstanding is on the nature of a SCALAR.  A Scalar means ONE value.  A correlated SCALAR subquery (a correlated subquery embedded as a column in a select statement) can only return a SINGLE SCALAR result.

It matters not whether your subquery returns 1 or 1,000,000 rows.  Only the value from the first row is returned.  Once this first row has been determined the subquery is terminated.  (That is, it always has " LIMIT 1" no matter what you might specify).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Balaji Ramanathan
>Sent: Saturday, 18 November, 2017 09:05
>To: [hidden email]
>Subject: [sqlite] Confusion about DISTINCT keyword
>
>I have the query below with one column in the final output coming
>from the
>main query and two columns in the final output coming from correlated
>subqueries:
>
>SELECT DISTINCT *modenumber*,
>
>
>(SELECT tripid
>
>FROM trip
>
>WHERE modenumber = T.modenumber
>
>ORDER BY distance DESC
>
>LIMIT 3
>
>) AS *tripid*,
>
>
>(SELECT distance
>
>FROM trip
>
>WHERE modenumber = T.modenumber
>
>ORDER BY distance DESC
>
>LIMIT 3
>
>) AS *distance*
>
>
>FROM trip T
>
>ORDER BY modenumber;
>
>
>I expected this query to produce at least 3x the number of distinct
>modenumbers because the results of the subqueries (especially the
>tripid)
>are different for each trip.  However, when I run it, I get only one
>row
>per modenumber.
>
>
>It is as if the DISTINCT keyword is applying only to the modenumber,
>and
>not to the other two columns in the select.  Is that because the
>other two
>columns are coming from subqueries?  The documentation does not
>explicitly
>mention subqueries, and only says distinct removes duplicate rows.
>In this
>case, it seems to think that the row consists of just the column from
>the
>main query and does not include the columns from the subqueries.
>
>
>Is this a bug or is this the expected behavior of the DISTINCT
>keyword?
>Thank you.
>
>
>Balaji Ramanathan
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Confusion about DISTINCT keyword

Balaji Ramanathan
In reply to this post by Balaji Ramanathan
This is great information.  Thank you very much for a clear explanation,
Keith.  I guess I have to go back to using CTE's to get what I want in this
case.

Balaji Ramanathan

---------- Forwarded message ----------
From: Keith Medcalf <[hidden email]>
To: SQLite mailing list <[hidden email]>
Cc:
Bcc:
Date: Sat, 18 Nov 2017 09:16:57 -0700
Subject: Re: [sqlite] Confusion about DISTINCT keyword

Neither.  It has nothing to do with the DISTINCT keyword, which causes only
DISTINCT rows to be returned (duplicates are removed).

You misunderstanding is on the nature of a SCALAR.  A Scalar means ONE
value.  A correlated SCALAR subquery (a correlated subquery embedded as a
column in a select statement) can only return a SINGLE SCALAR result.

It matters not whether your subquery returns 1 or 1,000,000 rows.  Only the
value from the first row is returned.  Once this first row has been
determined the subquery is terminated.  (That is, it always has " LIMIT 1"
no matter what you might specify).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says
a lot about anticipated traffic volume.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users