Running sums and averages

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Running sums and averages

Balaji Ramanathan
Hi,

    I have a table (mytable) with several numerical fields.  Let us call
them F1, F2, etc., etc.  The table also has a row number field (RowNumber)
that I make sure has incremental values from 1 through the number of rows
in the table.

    In addition to printing out the values of F1 through Fn, I also want to
print out, for each row of the table, the running sums and averages of
these fields.  There are two averages:  one that uses only the count of
non-null values of each field as the denominator and another that uses the
count of both null and non-null values of each field.  So, I wrote a query
as below to calculate and print those values:

select RowNumber,
F1,
(select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF1,
(select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF1, --average excluding null values
(select sum(F1) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null values
F2,
(select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF2,
(select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF2,
(select sum(F2) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF2,
--Other fields, their running sums and averages go here
from mytable as T order by RowNumber

    I have about 3000 rows in my table and about 10 fields I am doing this
for.  When I select F1 through F10 alone, the query runs in negligible
time.  But when I run the above query with running sums and averages
included (total of 40 columns instead of 10 columns), the time increases to
well over 5 minutes.

    When I run the query at the SQLite command line, the first 1000 or so
records are produced very fast, but as the row number increases, I can see
SQLite struggling to produce outputs.  By the time I get to the end of the
query, the records are being produced at about a rate of only 1 or 2 per
second.

    Is there anything I can do to reduce the time taken?  Is this the most
efficient way to query for running sums and averages?  Any other ideas to
get what I need any quicker?

    The added wrinkle in all this is that mytable is actually not a table,
but a view.  F1 through F10 are computed from various tables and brought
together in this view.  So, I don't think I can create indexes on any of
these fields.  I have thought about creating a temp table out of this view
so that I can index RowNumber (I don't know whether it will help) but it
goes against my minimalist instincts.

    Thank you.

Balaji Ramanathan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Running sums and averages

Simon Slavin-3


On 12 Nov 2017, at 4:05pm, Balaji Ramanathan <[hidden email]> wrote:

> Is there anything I can do to reduce the time taken?

Do it in your favourite programming language rather than SQL.  Congratulations on finding a way of doing it in SQL, but running sums and averages are trivial in a programming language and will be far faster and you can be as fussy as you like about whether nulls are counted and such things.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Running sums and averages

Joseph R. Justice
In reply to this post by Balaji Ramanathan
On Nov 12, 2017 11:06 AM, "Balaji Ramanathan" <[hidden email]>
wrote:

Hi,

    I have a table (mytable) with several numerical fields.  Let us call
them F1, F2, etc., etc.  The table also has a row number field (RowNumber)
that I make sure has incremental values from 1 through the number of rows
in the table.

    In addition to printing out the values of F1 through Fn, I also want to
print out, for each row of the table, the running sums and averages of
these fields.  There are two averages:  one that uses only the count of
non-null values of each field as the denominator and another that uses the
count of both null and non-null values of each field.  So, I wrote a query
as below to calculate and print those values:

select RowNumber,
F1,
(select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF1,
(select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF1, --average excluding null values
(select sum(F1) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null values
F2,
(select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF2,
(select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF2,
(select sum(F2) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF2,
--Other fields, their running sums and averages go here
from mytable as T order by RowNumber

    I have about 3000 rows in my table and about 10 fields I am doing this
for.  When I select F1 through F10 alone, the query runs in negligible
time.  But when I run the above query with running sums and averages
included (total of 40 columns instead of 10 columns), the time increases to
well over 5 minutes.

    When I run the query at the SQLite command line, the first 1000 or so
records are produced very fast, but as the row number increases, I can see
SQLite struggling to produce outputs.  By the time I get to the end of the
query, the records are being produced at about a rate of only 1 or 2 per
second.

    Is there anything I can do to reduce the time taken?  Is this the most
efficient way to query for running sums and averages?  Any other ideas to
get what I need any quicker?

    The added wrinkle in all this is that mytable is actually not a table,
but a view.  F1 through F10 are computed from various tables and brought
together in this view.  So, I don't think I can create indexes on any of
these fields.  I have thought about creating a temp table out of this view
so that I can index RowNumber (I don't know whether it will help) but it
goes against my minimalist instincts.


I am *FAR* from a SQLite (or SQL) guru, but it seems obvious to me that
your calculations for sumF1, avgF1, OverallAvgF1, etc as written are
running in quadratic time based on the number of rows in table T.  Worse
yet, you're recalculating all these values entirely from scratch every time
you're recalculating them, rather than use the previously calculated value.

Is there any way you can calculate table T with just RowNumber, F1, F2,
..., F10, and then as you subsequently step through T row by row calculate
all the sums and averages as you step, saving the calculations for each row
as you calculate them in temporary variables for use in calculating the
values for the next row?  This might be doable in pure SQL code, or you
might have to do it in whatever language you're making calls to SQLite
from.  That should be linear time, at worst 2 times the number of rows in T
(once to construct T, once to step through it).

Be well.



Joseph
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Running sums and averages

Dennis Clarke
In reply to this post by Simon Slavin-3

>> Is there anything I can do to reduce the time taken?

123456789+123456789+123456789+123456789+123456789+123456789+123456789+12
 > < Simon correctly advised >
 > Do it in your favourite programming language rather than SQL.

Let me be even more clear :

     Memory is cheap and most servers have plenty.

     Processors are fast and most servers have multiple with many cores.

     Select the entire table of columns you need into memory.
     Write a little code.

No it won't scale very well into millions of rows but I could easily run
a test and I will bet many mnay many dollars that processing the sums in
memory is orders of magnitude faster than SQL.


Dennis

ps: if your db is MySQL or Oracle db then the problem is trivial with
      the C  API
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Running sums and averages

Keith Medcalf
In reply to this post by Balaji Ramanathan

EXPLAIN QUERY PLAN

is the first step.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Balaji Ramanathan
>Sent: Sunday, 12 November, 2017 09:06
>To: [hidden email]
>Subject: [sqlite] Running sums and averages
>
>Hi,
>
>    I have a table (mytable) with several numerical fields.  Let us
>call
>them F1, F2, etc., etc.  The table also has a row number field
>(RowNumber)
>that I make sure has incremental values from 1 through the number of
>rows
>in the table.
>
>    In addition to printing out the values of F1 through Fn, I also
>want to
>print out, for each row of the table, the running sums and averages
>of
>these fields.  There are two averages:  one that uses only the count
>of
>non-null values of each field as the denominator and another that
>uses the
>count of both null and non-null values of each field.  So, I wrote a
>query
>as below to calculate and print those values:
>
>select RowNumber,
>F1,
>(select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber)
>as
>sumF1,
>(select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber)
>as
>avgF1, --average excluding null values
>(select sum(F1) from mytable where mytable.RowNumber <=
>T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null
>values
>F2,
>(select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber)
>as
>sumF2,
>(select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber)
>as
>avgF2,
>(select sum(F2) from mytable where mytable.RowNumber <=
>T.RowNumber)/T.RowNumber as OverallAvgF2,
>--Other fields, their running sums and averages go here
>from mytable as T order by RowNumber
>
>    I have about 3000 rows in my table and about 10 fields I am doing
>this
>for.  When I select F1 through F10 alone, the query runs in
>negligible
>time.  But when I run the above query with running sums and averages
>included (total of 40 columns instead of 10 columns), the time
>increases to
>well over 5 minutes.
>
>    When I run the query at the SQLite command line, the first 1000
>or so
>records are produced very fast, but as the row number increases, I
>can see
>SQLite struggling to produce outputs.  By the time I get to the end
>of the
>query, the records are being produced at about a rate of only 1 or 2
>per
>second.
>
>    Is there anything I can do to reduce the time taken?  Is this the
>most
>efficient way to query for running sums and averages?  Any other
>ideas to
>get what I need any quicker?
>
>    The added wrinkle in all this is that mytable is actually not a
>table,
>but a view.  F1 through F10 are computed from various tables and
>brought
>together in this view.  So, I don't think I can create indexes on any
>of
>these fields.  I have thought about creating a temp table out of this
>view
>so that I can index RowNumber (I don't know whether it will help) but
>it
>goes against my minimalist instincts.
>
>    Thank you.
>
>Balaji Ramanathan
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Running sums and averages

Graham Holden
In reply to this post by Dennis Clarke

>>> Is there anything I can do to reduce the time taken?

>  > < Simon correctly advised >
>  > Do it in your favourite programming language rather than SQL.

> Let me be even more clear :

>      Memory is cheap and most servers have plenty.

>      Processors are fast and most servers have multiple with many cores.

>      Select the entire table of columns you need into memory.
>      Write a little code.

> No it won't scale very well into millions of rows but I could easily run
> a test and I will bet many mnay many dollars that processing the sums in
> memory is orders of magnitude faster than SQL.

You shouldn't even need to read the entire table (or view) into
memory: just read row-by-row, and for each field, keep a running
total and the count of non-NULL values. From these you can calculate
your total and both types of average.

Graham



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Running sums and averages

David Raymond
In reply to this post by Balaji Ramanathan
As other folks have mentioned, doing it in an external language is going to be easiest.

That being said, CTE's are almost a full language in themselves.

I'm just gonna go with 1 value here for the example, but how about something like...

with recursive foo (RowNumber, F1, sumF1, OverallAvgF1, avgF1, F1notNull)
as(
  select
  rowNumber,
  F1,
  ifnull(F1, 0.0),
  ifnull(F1, 0.0),
  ifnull(F1, 0.0),
  F1 is not null
  from mytable where RowNumber = 1
 
  union all
 
  select
  mytable.RowNumber,
  mytable.F1,
  foo.sumF1 + ifnull(mytable.F1, 0.0),
  (foo.sumF1 + ifnull(mytable.F1, 0.0)) / mytable.RowNumber,
  (foo.sumF1 + ifnull(mytable.F1, 0.0)) / (foo.F1notNull + (mytable.F1 is not null)),
  foo.F1notNull + (mytable.F1 is not null)
 
  from
 
  foo inner join mytable on mytable.RowNumber = foo.RowNumber + 1)

select RowNumber, F1, sumF1, OverallAvgF1, avgF1 from foo;



-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Balaji Ramanathan
Sent: Sunday, November 12, 2017 11:06 AM
To: [hidden email]
Subject: [sqlite] Running sums and averages

Hi,

    I have a table (mytable) with several numerical fields.  Let us call
them F1, F2, etc., etc.  The table also has a row number field (RowNumber)
that I make sure has incremental values from 1 through the number of rows
in the table.

    In addition to printing out the values of F1 through Fn, I also want to
print out, for each row of the table, the running sums and averages of
these fields.  There are two averages:  one that uses only the count of
non-null values of each field as the denominator and another that uses the
count of both null and non-null values of each field.  So, I wrote a query
as below to calculate and print those values:

select RowNumber,
F1,
(select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF1,
(select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF1, --average excluding null values
(select sum(F1) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null values
F2,
(select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF2,
(select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF2,
(select sum(F2) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF2,
--Other fields, their running sums and averages go here
from mytable as T order by RowNumber

    I have about 3000 rows in my table and about 10 fields I am doing this
for.  When I select F1 through F10 alone, the query runs in negligible
time.  But when I run the above query with running sums and averages
included (total of 40 columns instead of 10 columns), the time increases to
well over 5 minutes.

    When I run the query at the SQLite command line, the first 1000 or so
records are produced very fast, but as the row number increases, I can see
SQLite struggling to produce outputs.  By the time I get to the end of the
query, the records are being produced at about a rate of only 1 or 2 per
second.

    Is there anything I can do to reduce the time taken?  Is this the most
efficient way to query for running sums and averages?  Any other ideas to
get what I need any quicker?

    The added wrinkle in all this is that mytable is actually not a table,
but a view.  F1 through F10 are computed from various tables and brought
together in this view.  So, I don't think I can create indexes on any of
these fields.  I have thought about creating a temp table out of this view
so that I can index RowNumber (I don't know whether it will help) but it
goes against my minimalist instincts.

    Thank you.

Balaji Ramanathan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Running sums and averages

Balaji Ramanathan
In reply to this post by Balaji Ramanathan
OK, thanks for the short and sweet answer!  I just wanted to make sure I
wasn't missing something.

Balaji Ramanathan


> ---------- Forwarded message ----------
> From: Simon Slavin <[hidden email]>
> To: SQLite mailing list <[hidden email]>
> Cc:
> Bcc:
> Date: Sun, 12 Nov 2017 16:37:24 +0000
> Subject: Re: [sqlite] Running sums and averages
>
>
> On 12 Nov 2017, at 4:05pm, Balaji Ramanathan <[hidden email]>
> wrote:
>
> > Is there anything I can do to reduce the time taken?
>
> Do it in your favourite programming language rather than SQL.
> Congratulations on finding a way of doing it in SQL, but running sums and
> averages are trivial in a programming language and will be far faster and
> you can be as fussy as you like about whether nulls are counted and such
> things.
>
> Simon.
>
>
> ---------- Forwarded message ----------
> From: "Joseph R. Justice" <[hidden email]>
>


> I am *FAR* from a SQLite (or SQL) guru, but it seems obvious to me that
> your calculations for sumF1, avgF1, OverallAvgF1, etc as written are
> running in quadratic time based on the number of rows in table T.  Worse
> yet, you're recalculating all these values entirely from scratch every time
> you're recalculating them, rather than use the previously calculated value.
>
> Is there any way you can calculate table T with just RowNumber, F1, F2,
> ..., F10, and then as you subsequently step through T row by row calculate
> all the sums and averages as you step, saving the calculations for each row
> as you calculate them in temporary variables for use in calculating the
> values for the next row?  This might be doable in pure SQL code, or you
> might have to do it in whatever language you're making calls to SQLite
> from.  That should be linear time, at worst 2 times the number of rows in T
> (once to construct T, once to step through it).
>
> Be well.
>
>
>
> Joseph
>
>
>
> ---------- Forwarded message ----------
> From: Dennis Clarke <[hidden email]>
> > < Simon correctly advised >
> > Do it in your favourite programming language rather than SQL.
>
> Let me be even more clear :
>
>     Memory is cheap and most servers have plenty.
>
>     Processors are fast and most servers have multiple with many cores.
>
>     Select the entire table of columns you need into memory.
>     Write a little code.
>
> No it won't scale very well into millions of rows but I could easily run
> a test and I will bet many mnay many dollars that processing the sums in
> memory is orders of magnitude faster than SQL.
>
>
> Dennis
>
> ps: if your db is MySQL or Oracle db then the problem is trivial with
>      the C  API
>
>
>
> ---------- Forwarded message ----------
> From: Keith Medcalf <[hidden email]>
> Subject: Re: [sqlite] Running sums and averages
>
> EXPLAIN QUERY PLAN
>
> is the first step.
>


> ---------- Forwarded message ----------
> From: Graham Holden <[hidden email]>
>


> You shouldn't even need to read the entire table (or view) into
> memory: just read row-by-row, and for each field, keep a running
> total and the count of non-NULL values. From these you can calculate
> your total and both types of average.
>
> Graham
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users