Dear all,
I need to perform a kind of partial GROUP BY to determine the beginnings and ends of sets of identical data. I can't use a full GROUP BY because these sets can be repeated and their repetition must be conserved. Other database engines have solutions for this task (like windowing in postgre) but I wonder if there is an efficient recipe in SQLite. Example: ======= Table: mytable -------- date test ------ ------ 1 clim 3 clim 7 amb 10 amb 13 clim 15 clim 20 clim 22 amb 25 amb Desired result --------------------- date_from date_to test --------------- ------------ ------ 1 3 clim 7 10 amb 13 15 clim 22 25 amb (non optimal) solution found ===================== CREATE VIEW mytablebydate AS -- Pre-order table to avoid ordering it twice in sub-queries SELECT * FROM mytable ORDER BY date CREATE VIEW mytablenext AS SELECT date, test, ( -- first row > this row SELECT date -- NULL if not exists FROM mytablebydate WHERE date > MT.date LIMIT 1 ) as date_next, ( -- first row > this row SELECT test -- NULL if not exists FROM mytablebydate WHERE date > MT.date LIMIT 1 ) as test_next FROM mytable MT WHERE test != test_next -- Get desired results SELECT ( -- Date of the previous row SELECT MAX( date_next ) FROM mytablenext WHERE date_next < mt.date ) AS date_from, date AS date_to, -- this row test FROM mytablenext mt Comments ======== This method returns a Null for the first date_from and the last group is not returned. It is therefore incomplete. In addition, it involves quite a lot of subqueries. For completeness, it is inspired by http://stackoverflow.com/questions/30455227/date-range-for-set-of-same-data/30460263#30460263. So, is there a better / official way in SQLite ? Any help will be welcome, Jonathan ******************************************************************************* This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please do not read, save, forward, disclose or copy the contents of this e-mail. If this e-mail has been sent to you in error, please delete this e-mail and any copies or links to this e-mail completely and immediately from your system. We also like to inform you that communication via e-mail over the Internet is insecure because third parties may have the possibility to access and manipulate e-mails. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of The Swatch Group Ltd. ******************************************************************************* _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
Rossel, Jonathan wrote:
> Other database engines have solutions for this task (like windowing in > postgre) but I wonder if there is an efficient recipe in SQLite. SQLite does not have windowing functions. So the most efficient method would be to read the data with a simple ORDER BY, and do the grouping in your code. Regards, Clemens > This e-mail message is intended only ... This e-mail contains public information intended for any subscriber of this mailing list and for anybody else who bothers to read it; it will be copied, disclosed and distributed to the public. If you think you are not the intended recipient, please commit suicide immediately. These terms apply also to any e-mails quoted in, referenced from, or answering this e-mail, and supersede any confidentiality notices in those e-mails. Additionally, confidentiality notices in those e-mails will incur legal processing fees of $42 per line; you have agreed to this by reading this confidentiality notice. _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Rossel, Jonathan
You could try this, inspired by classic algorithms of temporal databases: create table T(date integer,test char(12)); insert into T values (1,'clim'),(3,'clim'),(7,'amb'),(10,'amb'),(12,'xxx'), (13,'clim'),(15,'clim'),(20,'clim'),(22,'amb'),(25,'amb'); create table TT(seq integer not null primary key autoincrement,date integer,test char(12)); insert into TT(date,test) select * from T order by date; select T1.date, T3.date, T1.test from TT T1, TT T3 -- More efficient than "where T1.date <= T3.date" where T1.seq <= T3.seq and T1.test = T3.test and not exists(select * from TT where seq = T1.seq-1 and test = T1.test) and not exists(select * from TT where seq = T3.seq+1 and test = T3.test) and not exists(select * from TT T2 -- More efficient than "where T2.date between T1.date and T3.date" where T2.seq between T1.seq and T3.seq and T2.test <> T1.test); Result: +------+------+------+ | date | date | test | +------+------+------+ | 1 | 3 | clim | | 7 | 10 | amb | | 12 | 12 | xxx | | 13 | 20 | clim | | 22 | 25 | amb | +------+------+------+ Working table TT is recommended to create an ordered sequence of rows in which "next" and "previous" rows are more easily described than in the source table. Avoid "order by" on views. It works in SQLite but it should not! The idea is to identify maximal sequences of identical "test" values as follow: - T1 denotes the first row of a sequence - T3 the last row - T2 any "disturbing" row lying between T1 and T3 but with a different value of "test" - first "not exists" condition states that T1 must be the very first of the sequence: it must not be immediately preceded by a row with same value of "test" - same for second "not exists" condition: T3 must be the last - the third "not exists" condition states that there is no "disturbing" row between T1 and T3. Valid if maximal sequences do not overlap. This query also detects single row sequences (e.g., 'xxx'). An index on TT.test may be useful to support T1*T3 join. For large tables, an iterative procedure will be faster, though less elegant! Regards Jean-Luc Hainaut _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Clemens Ladisch
> On Feb 15, 2017, at 11:16 AM, Clemens Ladisch <[hidden email]> wrote: > > SQLite does not have windowing functions. A continuous/continual tragedy indeed :| Still, worthwhile mentioning The Tabibitosan Method, for reference purpose: http://www.orchestrapit.co.uk/?p=53 https://community.oracle.com/message/3991678 Rather nifty in its simplicity and power. Sadly, out of reach to SQLite dwellers. _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
On 15 Feb 2017, at 11:58am, Petite Abeille <[hidden email]> wrote: > On Feb 15, 2017, at 11:16 AM, Clemens Ladisch <[hidden email]> wrote: > >> SQLite does not have windowing functions. > > A continuous/continual tragedy indeed :| Windowing breaks the philosophy behind SQL. Rows are meant to be members of a set, and your operations on them are meant to be set operations. There is no implicit order for set elements. That’s why bare-bones SQL implementations don’t have cursors or windowing. > Still, worthwhile mentioning The Tabibitosan Method, for reference purpose: > > http://www.orchestrapit.co.uk/?p=53 > https://community.oracle.com/message/3991678 > > Rather nifty in its simplicity and power. Sadly, out of reach to SQLite dwellers. Actually SQLite can do it, by iterating using recursive common table expressions: <https://www.sqlite.org/lang_with.html> . It looks like an interesting programming exercise, though there’s a danger it will lead to unreadable code (a perpetual danger with WITH). You would need a spare column in the table to store the 'group' values in. Simon. _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Rossel, Jonathan
@ Clemens, Petite Abeille,
Thanks, that's what I thought, but it's comforting to know for sure... @ Jean-Luc, Thanks a lot for the detailed answer, that's awesome ! I'll give it a try and see how it compares with an external "manual" grouping Jonathan ------------------------------ Message: 79 Date: Wed, 15 Feb 2017 11:16:24 +0100 From: Clemens Ladisch <[hidden email]> To: [hidden email] Subject: Re: [sqlite] Group contiguous rows (islands) Message-ID: <[hidden email]> Content-Type: text/plain; charset=us-ascii Rossel, Jonathan wrote: > Other database engines have solutions for this task (like windowing in > postgre) but I wonder if there is an efficient recipe in SQLite. SQLite does not have windowing functions. So the most efficient method would be to read the data with a simple ORDER BY, and do the grouping in your code. Regards, Clemens ------------------------------ Message: 83 Date: Wed, 15 Feb 2017 12:02:32 +0100 From: Jean-Luc Hainaut <[hidden email]> To: [hidden email] Subject: Re: [sqlite] Group contiguous rows (islands) Message-ID: <[hidden email]> Content-Type: text/plain; charset=UTF-8; format=flowed You could try this, inspired by classic algorithms of temporal databases: create table T(date integer,test char(12)); insert into T values (1,'clim'),(3,'clim'),(7,'amb'),(10,'amb'),(12,'xxx'), (13,'clim'),(15,'clim'),(20,'clim'),(22,'amb'),(25,'amb'); create table TT(seq integer not null primary key autoincrement,date integer,test char(12)); insert into TT(date,test) select * from T order by date; select T1.date, T3.date, T1.test from TT T1, TT T3 -- More efficient than "where T1.date <= T3.date" where T1.seq <= T3.seq and T1.test = T3.test and not exists(select * from TT where seq = T1.seq-1 and test = T1.test) and not exists(select * from TT where seq = T3.seq+1 and test = T3.test) and not exists(select * from TT T2 -- More efficient than "where T2.date between T1.date and T3.date" where T2.seq between T1.seq and T3.seq and T2.test <> T1.test); Result: +------+------+------+ | date | date | test | +------+------+------+ | 1 | 3 | clim | | 7 | 10 | amb | | 12 | 12 | xxx | | 13 | 20 | clim | | 22 | 25 | amb | +------+------+------+ Working table TT is recommended to create an ordered sequence of rows in which "next" and "previous" rows are more easily described than in the source table. Avoid "order by" on views. It works in SQLite but it should not! The idea is to identify maximal sequences of identical "test" values as follow: - T1 denotes the first row of a sequence - T3 the last row - T2 any "disturbing" row lying between T1 and T3 but with a different value of "test" - first "not exists" condition states that T1 must be the very first of the sequence: it must not be immediately preceded by a row with same value of "test" - same for second "not exists" condition: T3 must be the last - the third "not exists" condition states that there is no "disturbing" row between T1 and T3. Valid if maximal sequences do not overlap. This query also detects single row sequences (e.g., 'xxx'). An index on TT.test may be useful to support T1*T3 join. For large tables, an iterative procedure will be faster, though less elegant! Regards Jean-Luc Hainaut ------------------------------ Message: 89 Date: Wed, 15 Feb 2017 12:58:07 +0100 From: Petite Abeille <[hidden email]> To: SQLite mailing list <[hidden email]> Subject: Re: [sqlite] Group contiguous rows (islands) Message-ID: <[hidden email]> Content-Type: text/plain; charset=us-ascii > On Feb 15, 2017, at 11:16 AM, Clemens Ladisch <[hidden email]> wrote: > > SQLite does not have windowing functions. A continuous/continual tragedy indeed :| Still, worthwhile mentioning The Tabibitosan Method, for reference purpose: http://www.orchestrapit.co.uk/?p=53 https://community.oracle.com/message/3991678 Rather nifty in its simplicity and power. Sadly, out of reach to SQLite dwellers. ------------------------------ Subject: Digest Footer _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ------------------------------ End of sqlite-users Digest, Vol 110, Issue 15 ********************************************* ******************************************************************************* This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please do not read, save, forward, disclose or copy the contents of this e-mail. If this e-mail has been sent to you in error, please delete this e-mail and any copies or links to this e-mail completely and immediately from your system. We also like to inform you that communication via e-mail over the Internet is insecure because third parties may have the possibility to access and manipulate e-mails. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of The Swatch Group Ltd. ******************************************************************************* _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Rossel, Jonathan
@ Simon,
Thanks for the input ! I was afraid someone was going to mention the dreaded recursive CTEs. Jonathan ******************************************************************************* This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please do not read, save, forward, disclose or copy the contents of this e-mail. If this e-mail has been sent to you in error, please delete this e-mail and any copies or links to this e-mail completely and immediately from your system. We also like to inform you that communication via e-mail over the Internet is insecure because third parties may have the possibility to access and manipulate e-mails. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of The Swatch Group Ltd. ******************************************************************************* _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Jean-Luc Hainaut
15 feb 2017, Jean-Luc Hainaut:
> > You could try this, inspired by classic algorithms of temporal > databases: > > create table T(date integer,test char(12)); > insert into T > values (1,'clim'),(3,'clim'),(7,'amb'),(10,'amb'),(12,'xxx'), > (13,'clim'),(15,'clim'),(20,'clim'),(22,'amb'),(25,'amb'); > > create table TT(seq integer not null primary key autoincrement,date > integer,test char(12)); > insert into TT(date,test) select * from T order by date; > > select T1.date, T3.date, T1.test > from TT T1, TT T3 > -- More efficient than "where T1.date <= T3.date" > where T1.seq <= T3.seq > and T1.test = T3.test > and not exists(select * from TT where seq = T1.seq-1 and test = > T1.test) > and not exists(select * from TT where seq = T3.seq+1 and test = > T3.test) > and not exists(select * > from TT T2 > -- More efficient than "where T2.date between > T1.date and T3.date" > where T2.seq between T1.seq and T3.seq > and T2.test <> T1.test); > > Result: > > +------+------+------+ > | date | date | test | > +------+------+------+ > | 1 | 3 | clim | > | 7 | 10 | amb | > | 12 | 12 | xxx | > | 13 | 20 | clim | > | 22 | 25 | amb | > +------+------+------+ > relational. Thanks, E Pasma. create table T(date integer,test char(12)); insert into T values (1,'clim'),(3,'clim'),(7,'amb'),(10,'amb'),(12,'xxx'), (13,'clim'),(15,'clim'),(20,'clim'),(22,'amb'),(25,'amb'); select min(date) as fromdate, max(date) as enddate, test from (--get closest preceeding different key select t.*, max(t2.date) as key2 from t left join t t2 on t2.date<t.date and t2.test<>t.test group by t.date ) group by key2 ; _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
On 15/02/2017 18:34, E.Pasma wrote:
> > Hello, the query below is simpler. May be slower. But looks pretty > relational. Thanks, E Pasma. > > create table T(date integer,test char(12)); > insert into T > values (1,'clim'),(3,'clim'),(7,'amb'),(10,'amb'),(12,'xxx'), > (13,'clim'),(15,'clim'),(20,'clim'),(22,'amb'),(25,'amb'); > > select min(date) as fromdate, max(date) as enddate, test > from (--get closest preceeding different key > select t.*, max(t2.date) as key2 > from t > left join t t2 > on t2.date<t.date and t2.test<>t.test > group by t.date > ) > group by key2 Quite nice solution indeed! For those who may feel uncomfortable with outer joins, the from clause could be written as a subquery: from (select date, test, (select max(date) from t t2 where t2.date < t.date and t2.test <> t.test) as key2) Thanks J-L _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
Jean-Luc Hainaut: > On 15/02/2017 18:34, E.Pasma wrote: >> >> Hello, the query below is simpler. May be slower. But looks pretty >> relational. Thanks, E Pasma. >> >> create table T(date integer,test char(12)); >> insert into T >> values (1,'clim'),(3,'clim'),(7,'amb'),(10,'amb'),(12,'xxx'), >> (13,'clim'),(15,'clim'),(20,'clim'),(22,'amb'),(25,'amb'); >> >> select min(date) as fromdate, max(date) as enddate, test >> from (--get closest preceeding different key >> select t.*, max(t2.date) as key2 >> from t >> left join t t2 >> on t2.date<t.date and t2.test<>t.test >> group by t.date >> ) >> group by key2 > > Quite nice solution indeed! > For those who may feel uncomfortable with outer joins, the from > clause could be written as a subquery: > > from (select date, test, (select max(date) > from t t2 > where t2.date < t.date > and t2.test <> t.test) > as key2) > > Thanks > > J-L > 1 instead of MAX from (select date, test, (select t2.date from t t2 where t2.date < t.date and t2.test <> t.test order by t2.date desc limit 1) as key2 _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by Rossel, Jonathan
@ Pasma and Hainaut,
Thanks again, that looks promising ! Jonathan Message: 42 Date: Wed, 15 Feb 2017 21:10:10 +0100 From: "E.Pasma" <[hidden email]> To: SQLite mailing list <[hidden email]> Subject: Re: [sqlite] Group contiguous rows (islands) Message-ID: <[hidden email]> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Jean-Luc Hainaut: > On 15/02/2017 18:34, E.Pasma wrote: >> >> Hello, the query below is simpler. May be slower. But looks pretty >> relational. Thanks, E Pasma. >> >> create table T(date integer,test char(12)); >> insert into T >> values (1,'clim'),(3,'clim'),(7,'amb'),(10,'amb'),(12,'xxx'), >> (13,'clim'),(15,'clim'),(20,'clim'),(22,'amb'),(25,'amb'); >> >> select min(date) as fromdate, max(date) as enddate, test >> from (--get closest preceeding different key >> select t.*, max(t2.date) as key2 >> from t >> left join t t2 >> on t2.date<t.date and t2.test<>t.test >> group by t.date >> ) >> group by key2 > > Quite nice solution indeed! > For those who may feel uncomfortable with outer joins, the from > clause could be written as a subquery: > > from (select date, test, (select max(date) > from t t2 > where t2.date < t.date > and t2.test <> t.test) > as key2) > > Thanks > > J-L > 1 instead of MAX from (select date, test, (select t2.date from t t2 where t2.date < t.date and t2.test <> t.test order by t2.date desc limit 1) as key2 ******************************************************************************* This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please do not read, save, forward, disclose or copy the contents of this e-mail. If this e-mail has been sent to you in error, please delete this e-mail and any copies or links to this e-mail completely and immediately from your system. We also like to inform you that communication via e-mail over the Internet is insecure because third parties may have the possibility to access and manipulate e-mails. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of The Swatch Group Ltd. ******************************************************************************* _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
Free forum by Nabble | Edit this page |