Is it legal SQL to select a specific row of a group via a HAVING clause?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it legal SQL to select a specific row of a group via a HAVING clause?

Rowan Worth
Hi guys,

Imagine a DB storing a history of currency exchange rate info.

CREATE TABLE Prices (
    id INTEGER PRIMARY KEY,
    day INTEGER,
    currency TEXT,
    price FLOAT);

Assume 'day' increases monotonically and there is at most one price
recorded per currency per day - if you want to know the latest prices, is
this legal?

SELECT currency, price FROM Prices GROUP BY currency HAVING time =
MAX(time);

I've seen this approach used and my reading of
https://www.sqlite.org/lang_select.html#resultset hasn't left me confident
that it is correct SQL, but in practice it seems to work fine... eg:

insert into Prices (day, currency, price) values (1, 'EUR', 1.86);
insert into Prices (day, currency, price) values (1, 'USD', 1.20);
insert into Prices (day, currency, price) values (2, 'EUR', 1.92);
insert into Prices (day, currency, price) values (2, 'USD', 1.18);
insert into Prices (day, currency, price) values (3, 'USD', 1.24);
/* price for EUR on day 3 intentionally left blank */

sqlite> SELECT currency, price FROM Prices GROUP BY currency HAVING day =
MAX(day);
EUR|1.92
USD|1.24

MIN also seems to work:

sqlite> SELECT currency, price FROM Prices GROUP BY currency HAVING day =
MIN(day);
EUR|1.86
USD|1.2

As an aside, the query definitely doesn't work to get prices for a specific
day:

sqlite> SELECT currency, price FROM Prices GROUP BY currency HAVING day = 2;
EUR|1.92

I think the following explains the last result:

> If a HAVING clause is a non-aggregate expression, it is evaluated with
> respect to an arbitrarily selected row from the group.

And the arbitrary row for EUR just happened to be day=2 here. But the
documentation doesn't mention non-aggregate queries with a group by
clause...


> If the HAVING clause is an aggregate expression, it is evaluated across
all
> rows in the group.

I'm not sure how to interpret this in the context of the "day = MAX(day)"
expression though - does "evaluated across all rows" mean that rows
excluded by the clause are discarded?

If not then I suspect this query is not legit... but I don't feel I've
understood everything so any clarification is appreciated!

Cheers,
-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is it legal SQL to select a specific row of a group via a HAVING clause?

Clemens Ladisch
Rowan Worth wrote:
> if you want to know the latest prices, is this legal?
>
> SELECT currency, price FROM Prices GROUP BY currency HAVING time = MAX(time);

It is not legal ANSI SQL, and most other databases will complain.

While the WHERE clause allows to filter out rows from the table, the
HAVING clause is evaluated _after_ the groups have already been created,
and allows to filter out entire groups (i.e., output rows).

In this particular query, "HAVING time = MAX(time)" throws away any
group where the "time" value does not happen to be the maximum value
in the group.  What makes this query actually work is that SQLite has
a (somewhat undocumented) extension where using a single MAX() or MIN()
determines the row in a group where unaggregated expressions come from
(instead of using some random row).  But this is just a side effect of
the expression in the HAVING clause; using HAVING like this is an
extremely misleading way to write this query.  Better write the query
like this:

  SELECT currency, price, MAX(time) FROM Prices GROUP BY currency;

(Same extension, so this is not legal ANSI SQL either.)

Neither your original query nor this query work if there is more than
one MIN()/MAX(), or if some other aggregation function is used.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is it legal SQL to select a specific row of a group via a HAVING clause?

Hick Gunter
In reply to this post by Rowan Worth
The main difference between HAVING and WHERE ist that WHERE operates on the input set and HAVING operates on the output set.

If your condition requires computing an aggregate, then HAVING is a viable method of avoiding a subquery that needs to repeatedly scan the input table:

SELECT currency,price FROM Prices p WHERE day = (SELECT max(day) from Prices where currency=p.currency);

explain query plan SELECT currency,price FROM Prices p WHERE day = (SELECT max(day) from Prices where currency=p.currency);
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SCAN TABLE Prices AS p (~500000 rows)
0     0              0     EXECUTE CORRELATED SCALAR SUBQUERY 1
1     0              0     SEARCH TABLE Prices USING AUTOMATIC COVERING INDEX (currency=?) (~1 rows)

explain query plan SELECT currency, price FROM Prices GROUP BY currency HAVING day = max(day);
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SCAN TABLE Prices (~1000000 rows)
0     0              0     USE TEMP B-TREE FOR GROUP BY

If your condition does not require computing an aggregate, just use WHERE.

SELECT currency,price from Prices WHERE day=2;

To find the latest price on or before day 3:

SELECT currency,price from Prices WHERE day <= 3 GROUP by currency HAVING day = max(day);

explain query plan SELECT currency,price from Prices WHERE day <= 3 GROUP by currency HAVING day = max(day);
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SCAN TABLE Prices (~333333 rows)
0     0              0     USE TEMP B-TREE FOR GROUP BY

Note that this is without any defined indexes. Plans may change depending on indexes added, and the cardinality of constrained fields.




-----Urspr√ľngliche Nachricht-----
Von: Rowan Worth [mailto:[hidden email]]
Gesendet: Dienstag, 29. September 2015 09:06
An: General Discussion of SQLite Database
Betreff: [sqlite] Is it legal SQL to select a specific row of a group via a HAVING clause?

Hi guys,

Imagine a DB storing a history of currency exchange rate info.

CREATE TABLE Prices (
    id INTEGER PRIMARY KEY,
    day INTEGER,
    currency TEXT,
    price FLOAT);

Assume 'day' increases monotonically and there is at most one price recorded per currency per day - if you want to know the latest prices, is this legal?

SELECT currency, price FROM Prices GROUP BY currency HAVING time = MAX(time);

I've seen this approach used and my reading of https://www.sqlite.org/lang_select.html#resultset hasn't left me confident that it is correct SQL, but in practice it seems to work fine... eg:

insert into Prices (day, currency, price) values (1, 'EUR', 1.86); insert into Prices (day, currency, price) values (1, 'USD', 1.20); insert into Prices (day, currency, price) values (2, 'EUR', 1.92); insert into Prices (day, currency, price) values (2, 'USD', 1.18); insert into Prices (day, currency, price) values (3, 'USD', 1.24);
/* price for EUR on day 3 intentionally left blank */

sqlite> SELECT currency, price FROM Prices GROUP BY currency HAVING day
sqlite> =
MAX(day);
EUR|1.92
USD|1.24

MIN also seems to work:

sqlite> SELECT currency, price FROM Prices GROUP BY currency HAVING day
sqlite> =
MIN(day);
EUR|1.86
USD|1.2

As an aside, the query definitely doesn't work to get prices for a specific
day:

sqlite> SELECT currency, price FROM Prices GROUP BY currency HAVING day
sqlite> = 2;
EUR|1.92

I think the following explains the last result:

> If a HAVING clause is a non-aggregate expression, it is evaluated with
> respect to an arbitrarily selected row from the group.

And the arbitrary row for EUR just happened to be day=2 here. But the documentation doesn't mention non-aggregate queries with a group by clause...


> If the HAVING clause is an aggregate expression, it is evaluated
> across
all
> rows in the group.

I'm not sure how to interpret this in the context of the "day = MAX(day)"
expression though - does "evaluated across all rows" mean that rows excluded by the clause are discarded?

If not then I suspect this query is not legit... but I don't feel I've understood everything so any clarification is appreciated!

Cheers,
-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is it legal SQL to select a specific row of a group via a HAVING clause?

Simon Slavin-3
In reply to this post by Rowan Worth

On 29 Sep 2015, at 8:05am, Rowan Worth <[hidden email]> wrote:

> CREATE TABLE Prices (
>    id INTEGER PRIMARY KEY,
>    day INTEGER,
>    currency TEXT,
>    price FLOAT);
>
> Assume 'day' increases monotonically and there is at most one price
> recorded per currency per day - if you want to know the latest prices, is
> this legal?
>
> SELECT currency, price FROM Prices GROUP BY currency HAVING time =
> MAX(time);

I see others have already told you not to use this, but I'd like to work towards a better command.  To find the latest price for any one currency do this:

SELECT currency,price FROM Prices
        WHERE currency='EUR'
        ORDER BY day DESC
        LIMIT 1;

This finds all the rows for the right currency, then sorts them so that the latest is at the top, then returns just the top row.  If SQL has an index like the following:

CREATE INDEX Prices_cd ON Prices (currency, day);

then it will use this index to jump straight to the correct row to return, and it will return the result very quickly.  Even better, this index will speed up the other SELECT involved in this.

So to use the above command with every currency you need another SELECT which will pick out each currency you have data for.  You can do that like this:

SELECT DISTINCT currency FROM Prices;

So you should be able to combine the two:

SELECT currency AS thisCCy,price FROM Prices
        WHERE price = (SELECT price FROM Prices
                        WHERE currency = thisCCy
                        ORDER BY day DESC
                        LIMIT 1);

or, because we have an index which will make this faster,

SELECT currency AS thisCCy,price FROM Prices
        WHERE day = (SELECT day FROM Prices
                        WHERE currency = thisCCy
                        ORDER BY day DESC
                        LIMIT 1);

I don't like using sub-SELECT and I would love to see another reader rephrase this using 'WITH' or a VIEW.

As an incidental note, I advise you to store the time-of-day of each price in your Prices file even if you store only one price per day.  Any auditor of your system will find their job easier if they can easily figure out which price (from those throughout the day) you decided to use.  It may not make things easier in the day-to-day running of the system but it will make things easier if it's ever audited, or if you every have to discuss the data with a customer.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is it legal SQL to select a specific row of a group via a HAVING clause?

Gabor Grothendieck
On Tue, Sep 29, 2015 at 6:16 AM, Simon Slavin <[hidden email]> wrote:

> I don't like using sub-SELECT and I would love to see another reader
> rephrase this using 'WITH' or a VIEW.
>

with sub as (select currency, price, max(day) from prices group by currency)
 select currency, price from sub;
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is it legal SQL to select a specific row of a group via a HAVING clause?

James K. Lowden
In reply to this post by Rowan Worth
On Tue, 29 Sep 2015 15:05:42 +0800
Rowan Worth <[hidden email]> wrote:

> Imagine a DB storing a history of currency exchange rate info.
>
> CREATE TABLE Prices (
>     id INTEGER PRIMARY KEY,
>     day INTEGER,
>     currency TEXT,
>     price FLOAT);

Better for your purpose would be:

CREATE TABLE Prices (
    day INTEGER not NULL,
    currency TEXT not NULL,
    price FLOAT not NULL,
    PRIMARY KEY(day, currency)
);

You'll note that your queries don't use the id column, and nothing in
your table definition ensures there's only one price per currency per
day.  

If it were me, I'd make day TEXT and add a constraint that requires the
month portion be between 1 and 12.

> if you want to know the latest prices, is this legal?
>
> SELECT currency, price FROM Prices GROUP BY currency HAVING time =
> MAX(time);

What you want is called existential quantification.  The standard SQL
for it is a correlated subquery, viz:

        SELECT *
        FROM Prices as p
        where exists (
                select 1 from Prices
                where currency = p.currency
                GROUP BY currency
                HAVING p.time = MAX(time)
        );

That version might be slower in SQLite than LIMIT 1, although it
shouldn't be.  You'll have to test it.  The advantage to you is that
it's standard SQL.  It will work on any SQL DBMS, and will help you
think about the issue in terms of sets instead of having to rely on
a crutch like LIMIT.  

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is it legal SQL to select a specific row of a group via a HAVING clause?

Darren Duncan
In reply to this post by Rowan Worth
All of the responses I've seen to this question so far seem logically wrong, or
at the very least are different than I would do it, and my proposal is one that
should work reliably on any DBMS.  You use a subquery in the FROM clause.

select currency, price
from (
   select currency, day, max(time) as time
   from prices
   group by currency, day
) as filter
   inner join prices using (currency, day, time)

The issue here is you want to return other details, the price, associated with
the latest time per currency-day, and you can't do that in SQL without having a
select query nested in another one; the inner determines the latest time per
currency-day and the outer one looks up other info related to it.

The above example should also perform very efficiently, besides being reliably
correct rather than just accidentally correct.

-- Darren Duncan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

When was PRAGMA busy_timeout added to SQLite3

Jim Morris
In reply to this post by Simon Slavin-3
I'm using an archaic version, I know, but busy_timeout doesn't seem to
work.  It doesn't return the value I set.  When was it added?

sqlite3 temp.db
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> PRAGMA busy_timeout=30000;
sqlite> PRAGMA busy_timeout;
sqlite>


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

When was PRAGMA busy_timeout added to SQLite3

Jim Morris
In reply to this post by Simon Slavin-3
I'm using an archaic version, I know, but busy_timeout doesn't seem to
work.  It doesn't return the value I set.  When was it added?

sqlite3 temp.db
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> PRAGMA busy_timeout=30000;
sqlite> PRAGMA busy_timeout;
sqlite>



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: When was PRAGMA busy_timeout added to SQLite3

Richard Hipp-3
In reply to this post by Jim Morris
On 10/2/15, Jim Morris <[hidden email]> wrote:
> I'm using an archaic version, I know, but busy_timeout doesn't seem to
> work.  It doesn't return the value I set.  When was it added?

2012-12-12 (3.7.15)

>
> sqlite3 temp.db
> SQLite version 3.5.9
> Enter ".help" for instructions
> sqlite> PRAGMA busy_timeout=30000;
> sqlite> PRAGMA busy_timeout;
> sqlite>
>
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: When was PRAGMA busy_timeout added to SQLite3

Simon Slavin-3
In reply to this post by Jim Morris

On 3 Oct 2015, at 12:19am, Jim Morris <[hidden email]> wrote:

> I'm using an archaic version, I know, but busy_timeout doesn't seem to work.  It doesn't return the value I set.  When was it added?
>
> sqlite3 temp.db
> SQLite version 3.5.9
> Enter ".help" for instructions
> sqlite> PRAGMA busy_timeout=30000;
> sqlite> PRAGMA busy_timeout;
> sqlite>

Don't know when, but it seems to have been fixed since the version you had:

SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> PRAGMA busy_timeout;
0
sqlite> PRAGMA busy_timeout=30000;
30000
sqlite> PRAGMA busy_timeout;
30000
sqlite>

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users