SQL help: by-range aggregation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

SQL help: by-range aggregation

Dominique Devienne
Given a table with a (parent, child) PK, which carries a size "data" column,
I'd like to aggregate the sum of the size per-parent, from 1KB, to each
range
from low to 8*low (i.e. [low, 8*low)), but w/o having to write my hand all
possible
combination, with extra points for formatting the range bounds in KB, MB,
GB as appropriate.

This is not too dissimilar to my recent question about the "thousand
separator",
but a bit more complex. Any help of formulating this query generically
(using a CTE?)
would be appreciated.

Thanks, --DD

C:\Users\ddevienne>sqlite3.18.0rc
SQLite version 3.18.0 2017-03-06 20:44:13
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select printf('%,d', 1001);
1,001
sqlite> select 'Thank you Richard';
Thank you Richard
sqlite> create table t (p, c, s integer, primary key (p, c));
sqlite> insert into t values
   ...> (1, 1, 4),
   ...> (1, 2, 900),
   ...> (2, 1, 12000),
   ...> (2, 2, 24000),
   ...> (3, 1, 123456),
   ...> (3, 2, 123456),
   ...> (3, 3, 123456);
sqlite> select p, count(*), sum(s) from t group by p order by 3 desc;
3|3|370368
2|2|36000
1|2|904
sqlite> with u(v) as (select sum(s) from t group by p)
   ...> select '[1B, 1KB)', count(*) from u where v < 1024
   ...> union all
   ...> select '[1KB, 8KB)', count(*) from u where 1024 <= v and v < 8*1024
   ...> union all
   ...> select '[8KB, 64KB)', count(*) from u where 8*1024 <= v and v <
8*8*1024
   ...> union all
   ...> select '[64KB, 512KB)', count(*) from u where 8*8*1024 <= v and v <
8*8*8*1024
   ...> ;
[1B, 1KB)|1
[1KB, 8KB)|0
[8KB, 64KB)|1
[64KB, 512KB)|1
sqlite>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Simon Slavin-3

On 8 Mar 2017, at 1:46pm, Dominique Devienne <[hidden email]> wrote:

> I'd like to aggregate the sum of the size per-parent, from 1KB, to each
> range
> from low to 8*low (i.e. [low, 8*low)), but w/o having to write my hand all
> possible
> combination, with extra points for formatting the range bounds in KB, MB,
> GB as appropriate.

Does your programming language allow you to implememnt a SQLite external function which takes a value and returns a string indicating which range its in ?

This would allow you to implement triggers on INSERT and UPDATE which automatically set a 'category' field in the table.  Then you could filter on value, or use GROUP BY, or other things.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Hick Gunter
Alternatively create a "range" table, insert your defined ranges and join/subquery to the original query.

Create table range (label text, from integer, to integer);

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Simon Slavin
Gesendet: Mittwoch, 08. März 2017 14:53
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] SQL help: by-range aggregation


On 8 Mar 2017, at 1:46pm, Dominique Devienne <[hidden email]> wrote:

> I'd like to aggregate the sum of the size per-parent, from 1KB, to
> each range from low to 8*low (i.e. [low, 8*low)), but w/o having to
> write my hand all possible combination, with extra points for
> formatting the range bounds in KB, MB, GB as appropriate.

Does your programming language allow you to implememnt a SQLite external function which takes a value and returns a string indicating which range its in ?

This would allow you to implement triggers on INSERT and UPDATE which automatically set a 'category' field in the table.  Then you could filter on value, or use GROUP BY, or other things.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Dominique Devienne
In reply to this post by Simon Slavin-3
On Wed, Mar 8, 2017 at 2:53 PM, Simon Slavin <[hidden email]> wrote:

> On 8 Mar 2017, at 1:46pm, Dominique Devienne <[hidden email]> wrote:
> > I'd like to aggregate the sum of the size per-parent, from 1KB, to each
> > range
> > from low to 8*low (i.e. [low, 8*low)), but w/o having to write my hand
> all
> > possible
> > combination, with extra points for formatting the range bounds in KB, MB,
> > GB as appropriate.
>
> Does your programming language allow you to implememnt a SQLite external
> function which takes a value and returns a string indicating which range
> its in ?
>
> This would allow you to implement triggers on INSERT and UPDATE which
> automatically set a 'category' field in the table.  Then you could filter
> on value, or use GROUP BY, or other things.
>

Same constraints of the "thousand separator" one, this will be a view in a
DB to be viewed by any SQLite client, so not allowed here. --DD

PS: will be allowed the day SQLite embeds a lightweight programming
language and allows defining "persistent" User Defined Functions in the DB
itself, in a sqlite_functions table for example :)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Simon Slavin-3

On 8 Mar 2017, at 2:33pm, Dominique Devienne <[hidden email]> wrote:

> Same constraints of the "thousand separator" one, this will be a view in a
> DB to be viewed by any SQLite client, so not allowed here. --DD

In that case, I think Hick’s solution is good.  Do it in two steps: construct a temporary table which includes the ranges (presumably using CASE), then interrogate that table.  (Then presumably drop that table.)

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Dominique Devienne
In reply to this post by Hick Gunter
On Wed, Mar 8, 2017 at 3:16 PM, Hick Gunter <[hidden email]> wrote:

> Alternatively create a "range" table, insert your defined ranges and
> join/subquery to the original query.
>
> Create table range (label text, from integer, to integer);
>

That's the easy part (and a CTE can probably generate it dynamically, and
even I could come up with that CTE):

create table ranges (name text, low integer, high integer)
insert into ranges values
('[0, 1KB)', 0, 1024),
('[1KB, 8KB)', 1024, 8*1024),
('[8KB, 64KB)', 8*1024, 8*8*1024),
('[64KB, 512KB)', 8*8*1024, 8*8*8*1024),
('[512KB, 4MB)', 8*8*8*1024, 8*8*8*8*1024),
('[4MB, 32MB)', 8*8*8*8*1024, 8*8*8*8*8*1024),
('[32MB, MAX)', 8*8*8*8*8*1024, 8*8*8*8*8*8*1024);

but the join/subquery you mention is still leaving me scratching my head...
--DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Clemens Ladisch
Dominique Devienne wrote:
> On Wed, Mar 8, 2017 at 3:16 PM, Hick Gunter <[hidden email]> wrote:
>> Alternatively create a "range" table, insert your defined ranges and
>> join/subquery to the original query.
>
> That's the easy part [...]
> but the join/subquery you mention is still leaving me scratching my head...

  WITH u(v) AS (
    SELECT sum(s)
    FROM t
    GROUP BY p
  )
  SELECT ranges.name,
         u.v
  FROM u
  JOIN ranges ON u.v >= ranges.low
             AND u.v <  ranges.high;

name           v
-------------  ------
[0, 1KB)       904
[8KB, 64KB)    36000
[64KB, 512KB)  370368


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Dominique Devienne
On Wed, Mar 8, 2017 at 3:47 PM, Clemens Ladisch <[hidden email]> wrote:

> Dominique Devienne wrote:
> > On Wed, Mar 8, 2017 at 3:16 PM, Hick Gunter <[hidden email]> wrote:
> >> Alternatively create a "range" table, insert your defined ranges and
> >> join/subquery to the original query.
> >
> > That's the easy part [...]
> > but the join/subquery you mention is still leaving me scratching my
> head...
>
>   WITH u(v) AS (
>     SELECT sum(s)
>     FROM t
>     GROUP BY p
>   )
>   SELECT ranges.name,
>          u.v
>   FROM u
>   JOIN ranges ON u.v >= ranges.low
>              AND u.v <  ranges.high;
>
> name           v
> -------------  ------
> [0, 1KB)       904
> [8KB, 64KB)    36000
> [64KB, 512KB)  370368
>

Thank you Clemens. I wasn't familiar with joins on range conditions like
this.

I needed count aggregates, so I moved the query into another CTE, and
grouped that.

Then the results where in alphabetical order of bucket names,
so I had to re-join on ranges to order by ranges.low.

Then I changed the join into a left-join to see all ranges, even empty ones.

And finally I trimmed empty ranges at the top and bottom, leaving
only the "middle" holes, as I think it's more natural that way.

Now I only need a CTE to dynamically generate the ranges,
but I'll call it a day and this will be for another time.

Here's my final query (on a slightly modified schema). --DD

with
bytes_per_set(v) as (
  select sum(byte_size)
    from values
   group by parent, set
),
minmax(a, b) as (
  select min(v), max(v) from bytes_per_set
),
buckets(name, v) as (
select ranges.name, bytes_per_set.v
  from bytes_per_set
  join ranges ON bytes_per_set.v >= ranges.low
             AND bytes_per_set.v < ranges.high
),
distrib(b, c) as (
select name, count(*)
  from buckets
 group by 1
)
select ranges.name as "size range",
       distrib.c as '#sets'
  from ranges
  left join distrib on ranges.name = distrib.b
 where ranges.low  < (select b from minmax)
   and ranges.high > (select a from minmax)
 order by ranges.low
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Simon Slavin-3

On 8 Mar 2017, at 3:35pm, Dominique Devienne <[hidden email]> wrote:
>
> Then the results where in alphabetical order of bucket names,
> so I had to re-join on ranges to order by ranges.low.

You should be able to just add the ORDER BY clause to the original JOIN:

SELECT ranges.name, bytes_per_set.v
 FROM bytes_per_set
 JOIN ranges ON bytes_per_set.v >= ranges.low
            AND bytes_per_set.v < ranges.high
 ORDER BY ranges.low

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

R Smith
In reply to this post by Dominique Devienne

On 2017/03/08 5:35 PM, Dominique Devienne wrote:
>
> Now I only need a CTE to dynamically generate the ranges,

Well, that seems like fun!
If I may...

-- CTEs
    -- Units Table (B, KB, MB, etc.)
WITH UT(id, uStr, byteDiv) AS (
     SELECT 0, 'B', 1
     UNION ALL
     SELECT UT.id+1,
            CASE UT.id WHEN 0 THEN 'KB' WHEN 1 THEN 'MB' WHEN 2 THEN
'GB' WHEN 3 THEN 'TB' WHEN 4 THEN 'PB' END,
            UT.byteDiv * 1024
       FROM UT
      WHERE UT.id < 5

-- Range Table (define lows/highs)
), RT(id, low, high) AS (
     SELECT 0, 0, 1024
     UNION ALL
     SELECT RT.id+1, RT.high, RT.high*8
       FROM RT
      WHERE RT.id < 16

    -- Conversion Values (Match low/high Ranges to Units)
), CV(rid, low, high, luStr, luDiv, huStr, huDiv) AS (
     SELECT RT.id, RT.low, RT.high, ULo.uStr, ULo.byteDiv, UHi.uStr,
UHi.byteDiv
       FROM RT
       LEFT JOIN UT AS ULo ON (ULo.byteDiv <= RT.low  AND
(ULo.byteDiv*1024) > RT.low ) OR (ULo.id = 0 AND RT.id = 0)
       LEFT JOIN UT AS UHi ON (UHi.byteDiv <= RT.high AND
(UHi.byteDiv*1024) > RT.high)
      WHERE UHi.id IS NOT NULL

    -- Range Table with Name vs. limits definitions
), range(name, low, high) AS (
     SELECT '['|| (CV.low / CV.luDiv) || CV.luStr ||' - '|| (CV.high /
CV.huDiv) || CV.huStr ||']', CV.low, CV.high
       FROM CV
)

SELECT *
   FROM range
;


   -- name             |                   low |                  high
   -- ---------------- | --------------------- | ---------------------
   -- [0B - 1KB]       |                     0 |                  1024
   -- [1KB - 8KB]      |                  1024 |                  8192
   -- [8KB - 64KB]     |                  8192 |                 65536
   -- [64KB - 512KB]   |                 65536 |                524288
   -- [512KB - 4MB]    |                524288 |               4194304
   -- [4MB - 32MB]     |               4194304 |              33554432
   -- [32MB - 256MB]   |              33554432 | 268435456
   -- [256MB - 2GB]    |             268435456 | 2147483648
   -- [2GB - 16GB]     |            2147483648 | 17179869184
   -- [16GB - 128GB]   |           17179869184 | 137438953472
   -- [128GB - 1TB]    |          137438953472 | 1099511627776
   -- [1TB - 8TB]      |         1099511627776 | 8796093022208
   -- [8TB - 64TB]     |         8796093022208 | 70368744177664
   -- [64TB - 512TB]   |        70368744177664 | 562949953421312
   -- [512TB - 4PB]    |       562949953421312 | 4503599627370496
   -- [4PB - 32PB]     |      4503599627370496 | 36028797018963968
   -- [32PB - 256PB]   |     36028797018963968 | 288230376151711744


(Hope the format survived e-mailing....)
Cheers,
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Dominique Devienne
In reply to this post by Simon Slavin-3
On Wed, Mar 8, 2017 at 6:08 PM, Simon Slavin <[hidden email]> wrote:

>
> On 8 Mar 2017, at 3:35pm, Dominique Devienne <[hidden email]> wrote:
> >
> > Then the results where in alphabetical order of bucket names,
> > so I had to re-join on ranges to order by ranges.low.
>
> You should be able to just add the ORDER BY clause to the original JOIN:
>
> SELECT ranges.name, bytes_per_set.v
>  FROM bytes_per_set
>  JOIN ranges ON bytes_per_set.v >= ranges.low
>             AND bytes_per_set.v < ranges.high
>  ORDER BY ranges.low


Thanks, but that doesn't work, because the later aggregation loses that
order.
And I also want to preserve "holes in the middle", so I need a left-join
somewhere,
and at the very end is the only I can make this work (not that there's not
a better way). --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

Dominique Devienne
In reply to this post by R Smith
On Wed, Mar 8, 2017 at 6:38 PM, R Smith <[hidden email]> wrote:

>
> On 2017/03/08 5:35 PM, Dominique Devienne wrote:
>
>>
>> Now I only need a CTE to dynamically generate the ranges,
>>
>
> Well, that seems like fun!
> If I may...
>

Nice! Thanks Ryan. --DD

PS: If we ever meet, your reasonably-priced beverage of choice is on me :)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQL help: by-range aggregation

R Smith

On 2017/03/09 10:11 AM, Dominique Devienne wrote:
> Nice! Thanks Ryan. --DD
>
> PS: If we ever meet, your reasonably-priced beverage of choice is on me :)

You're welcome, and safe, since I'm teetotal - which means I totally
only drink tea. :)

(Been this way since that fatal PUI incident - that's Programming under
the Influence - but we are all Mavericks sometimes!)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users