windowing functions != recursive functions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

windowing functions != recursive functions

pilot pirx
I was unaware of the windowing functions discusssion.
Having a look at the first link, it looks like we may be
talking about two subtly different issues.
The windowing functions described in the link
are different from recursive functions.
There is no problem in computing moving average
or cumulative sum etc
with the existing SQL (or dealing with time windows
in general)  - it just the SQL gets nasty - examples
are in 'SQL for smarties' by Joe Celko. Therefore
adding such functions to SQLite is 'nice' but does
not really increase the functionality.

In contrast, computation of the recursion function
adds it.

example with moving averages, since they
are mentioned in the windowing functions

assuming original series is
VALUE = 1 2 3 4 5 6 7

following statement will compute 5-day moving average for the whole column
UPDATE data SET
        MAVE5 = (SELECT AVG(val) FROM data AS h1 WHERE h1.dayno <=  data.dayno AND h1.dayno > data.dayno-05);
But this statement operates always on existing data in the existing column - to compute new value of MAVE5 it only needs to know values of VALUE. (When, for the element one, there is not enough data (because there is no dayno < 1)  SQL simply averages over existing data.)

Now, for the recursive function like exponential moving average the defintion is that
ema(i+1) =  val(i) * coef  + ema(i) * (1-coef). That is I have to know the previous value of
both EMA _and_  VALUE (while for moving avearage I need to know _only_
the previous value(s) of VALUE.


----- Original Message -----
From: "Andrew Piskorski" <[hidden email]>
To: [hidden email]
Subject: [sqlite] SQL Window/OLAP functions
Date: Wed, 12 Oct 2005 08:34:02 -0400

>
> On Wed, Oct 12, 2005 at 05:12:05AM -0500, pilot pirx wrote:
> > Subject: [sqlite] Please, please do _not_ remove this feature from SQLite...
>
> > While using SQLite for some time (with R package, www.r-project.org)
> > I did admire its functionality and speed. Then I did discover a
> > hidden SQLite feature of immense usefulness - not available in other
> > databases. SQLite can compute Fibonacci numbers! (I will explain why
>
> Transaction visibility features do vary, although often it doesn't
> matter anyway.  E.g., here's a dicussion of how (at least as of early
> 2004), PostgreSQL's docs were quite confused about certain subtleties,
> but what I find interesting, is this was still something that in
> practice had never really mattered to the mostly hard-core RDBMS
> programmers talking about it in that thread:
>
>    http://openacs.org/forums/message-view?message_id=176198
>
> > UPDATE fib SET
> >     val =  (SELECT h1.val FROM fib as h1 where pos = fib.pos - 1) +
> >            (SELECT h2.val FROM fib as h2 where pos = fib.pos - 2)
> > WHERE pos > 2;
>
> I don't see why this is such a great feature.  Without it, worst case,
> you could still write a simple little loop which would issue one
> update statement for each row, all within a single transaction.  No?
>
> > This is an _immensely_ useful functionality when one needs to
> > compute various recursive functions. For example exponential moving
> > average, used frequently in financials. Or Kalman filter (and many
>
> Vastly more useful for moving average and the like would be real
> windowing/grouping functions, like Oracle's "analytic" functions.  I'm
> not thrilled by their particular syntax, but the functionality is
> INCREDIBLY useful.  (And on the other hand, I haven't thought of any
> obviously better syntax, either.)
>
> Hm, an amendement to the SQL:1999 spec added windowing support, and
> SQL:2003 includes that, I think as features T611, "Elementrary OLAP
> functions" and T612, "Advanced OLAP functions".  Apparently Fred Zemke
> of Oracle was the author of that SQL spec, and IBM also supported it,
> so the SQL:2003 syntax and behavior is probably very similar (maybe
> identical?) to what Oracle 8i, 9i, and 10g and IBM's DB2 already have.
> PostgreSQL, as of 8.0, doesn't support it yet.
>
>    http://www.wintercorp.com/rwintercolumns/SQL_99snewolapfunctions.html
>    http://www.ncb.ernet.in/education/modules/dbms/SQL99/OLAP-99-154r2.pdf
>    http://www.wiscorp.com/sql/SQL2003Features.pdf
>    http://troels.arvin.dk/db/rdbms/#select-limit-offset
>    http://www.postgresql.org/docs/8.0/interactive/features.html
>    http://en.wikipedia.org/wiki/SQL
>    http://www.sigmod.org/sigmod/record/issues/0403/E.JimAndrew-standard.pdf
>    http://www.oracle.com/oramag/oracle/01-jul/o41industry.html
>
> SQLite basically supports just SQL-92, it doesn't have any of these
> newer SQL:1999 or SQL:2003 features, right?
>
> Using SQLite in conjunction with a powerful statistical data analysis
> programming language like R is an excellent example of a use where
> windowing functions can be hugely helpful.  Unfortunately, I've never
> had a compelling need to use SQLite for that, otherwise I'd probably
> take a shot at adding support for the SQL:2003 Window/OLAP stuff.  :)
>
> --
> Andrew Piskorski <[hidden email]>
> http://www.piskorski.com/


--
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

Reply | Threaded
Open this post in threaded view
|

Re: windowing functions != recursive functions

D. Richard Hipp
"pilot pirx" <[hidden email]> wrote:
 
> Now, for the recursive function like exponential moving average
> the defintion is that
>
> ema(i+1) =  val(i) * coef  + ema(i) * (1-coef).
>
> That is I have to know the previous value of both EMA _and_  VALUE
> (while for moving avearage I need to know _only_ the previous value(s)
> of VALUE.
>

You could write an "ema()" function for SQLite using the
scarcely documented API functions sqlite3_get_auxdata() and
sqlite3_set_auxdata().  (Those routines were intended to allow
functions like "regexp" to compile a constant regular expression
once and then reused the compiled regular expression on
subsequent calls.  But they have never been used for anything,
as far as I am aware.)

The ema() function would work like this:

   SELECT ema(x, 0.45) FROM table1;

Where 0.45 is the "coef".

I was wondering if it would be possible to write a "prev()"
function that returned the value of a column in the result
set from the previous row.  prev() could be used to implement
ema() in pure SQL.  But, alas, I do not see how you could
write a general-purpose prev() function using the current
API.  Some kind of extension would be required, I think.
--
D. Richard Hipp <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: windowing functions != recursive functions

Andrew Piskorski
In reply to this post by pilot pirx
On Wed, Oct 12, 2005 at 09:05:26PM -0500, pilot pirx wrote:

> The windowing functions described in the link
> are different from recursive functions.

Yes, I think you're right.  Your EMA example bugged me, so I fooled
with it, but I couldn't come up with any way to implement EMA using
plain SQL, even with the windowing/OLAP functions.

Some explanations of the EMA algorithm are here:

  http://www.investopedia.com/university/tm/TradingIndicators/TheInsAndOutsOfMovingAverages.asp
  http://www.ivorix.com/en/products/tech/smooth/ema.html

Looks like you don't even need real recursion for EMA (AKA, it is
tail-recursive), plain old iteration will do fine, as you can see from
the simple C++ code in the 2nd link above.  (Yet it can't be done in
SQL.  Well we all knew that SQL isn't Turing complete, here's a
practical consequence of that, I guess.)

> Now, for the recursive function like exponential moving average the
> defintion is that ema(i+1) = val(i) * coef + ema(i) * (1-coef). That
> is I have to know the previous value of both EMA _and_ VALUE (while
> for moving avearage I need to know _only_ the previous value(s) of
> VALUE.

It is quite possible to return previous (lagged) values of multiple
columns, NOT just one.  You do that with dense_rank, which I talked
about a back on May 23 on this email list:

  I always thought of that feature as "look-aside", as in, "Find the max
  of foo, but don't give me that, instead look aside and return me the
  corresponding value of bar next to it instead."

  select
    max(playerid) keep (dense_rank last order by sb) as top_stealer
  from batting

The problem is that EMA needs to use the lag of the actual column it's
calculating, and there's just no damn way to do that in SQL.

Well, I didn't try it, but maybe it would be possible to kludge EMA
somehow using either Oracle's "connect by", or DB2's recursive SQL,
which sounds possibly and less hideous than connect by:

  http://www-128.ibm.com/developerworks/db2/library/techarticle/0307steinbach/0307steinbach.html

Here's the example I tried for EMA (on Oracle 8.1.7.4):

create table atp_ema (symbol varchar2(8)  ,day date  ,val number);
insert into atp_ema values ('a' ,'2005-01-03' ,67);
insert into atp_ema values ('a' ,'2005-01-04' ,77);
insert into atp_ema values ('a' ,'2005-01-05' ,80);
insert into atp_ema values ('a' ,'2005-01-06' ,82);
insert into atp_ema values ('a' ,'2005-01-07' ,94);
insert into atp_ema values ('a' ,'2005-01-10' ,81);
insert into atp_ema values ('a' ,'2005-01-11' ,83);
insert into atp_ema values ('b' ,'2005-01-03' ,27);
insert into atp_ema values ('b' ,'2005-01-04' ,27);
insert into atp_ema values ('b' ,'2005-01-05' ,20);
insert into atp_ema values ('b' ,'2005-01-06' ,22);
insert into atp_ema values ('b' ,'2005-01-07' ,24);
insert into atp_ema values ('b' ,'2005-01-10' ,21);
insert into atp_ema values ('b' ,'2005-01-11' ,23);
 
variable coeff number;
begin  :coeff := 0.66;  end;
/

-- Correct algorithm, but can't run:
select  v.*
  ,(ema_term_1 + ema_term_2)  as ema
  ,(avg(val) over (partition by symbol  order by day
      rows between 3 preceding and current row))  as mean_4_day
from (
  select  symbol ,day ,val
    ,(:coeff * val) as ema_term_1
    ,( (1 - :coeff) *
       -- Of course ema is not defined here, so this fails with
       -- 'ORA-00904: invalid column name':
       nvl((lag(ema ,1) over (partition by symbol  order by day))
       ,val)
     ) as ema_term_2
  from atp_ema
) v
order by symbol ,day ;

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/

Reply | Threaded
Open this post in threaded view
|

Re: windowing functions != recursive functions

Andrew Piskorski
In reply to this post by pilot pirx
On Wed, Oct 12, 2005 at 09:05:26PM -0500, pilot pirx wrote:

> There is no problem in computing moving average
> or cumulative sum etc
> with the existing SQL (or dealing with time windows
> in general)  - it just the SQL gets nasty - examples

Except that the SQL doesn't "just" get nasty, it can quickly get
EXTREMELY nasty, and often has really lousy performance as well.

I regularly write (fairly) readable, fast queries using (Oracle's)
windowing/OLAP functions which AFAICT, would be either extremely
difficult or essentially impossible for me to write in SQL-92, other
than, perhaps, by writing some special purpose SQL compiler/translator
to do the job.

E.g., see this lengthy thread from 2001:

  http://openacs.org/forums/message-view?message_id=25521
  http://openacs.org/forums/message-view?message_id=25559
  http://openacs.org/forums/message-view?message_id=25607

That thread dates from before I'd ever used Oracle's windowing
functions.  Note that first non-OLAP SQL query for doing "time-series"
like stuff.  It's only moderately complicated, but it is also trying
to accomplish something conceptually VERY simple, yet its SQL is NOT
simple.  Writing more complicated queries in that fashion quickly
becomes infeasible.

Furthermore, that initial query also took about 3 seconds to run.  My
second non-OLAP way of doing the same query, arguably somewhat
clearer, took 75 seconds.

At the end of that same thread, after intervening confusion, Daryl
Biberdorf pointed out the OLAP lead() and lag() functions to me.  By
using lead(), the SQL becomes much easier to write and understand, AND
its execution become enormously faster - down to 0.1 seconds or so.

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

Re: windowing functions != recursive functions

pilot pirx
In reply to this post by pilot pirx
Thank you and AndrewP for the pointers - very interesting read.

Some clarifications. Coming from my perspective - that of R user,
not an API user - it is natural to do more
complicated operations in R than to try to write additional
functions in C/tcl etc. I do not know how common will such
attitude be among SQLite users not using R. (In general R
users tend to be statisticians with little knowledge of SQL.)
This is why my mail was not an attempt to propose some
extensions to core SQLite [it is so good for the intended
purpose that adding more stuff can spoil it :-)]

Were any additions for computing considered for SQLITE
it seems plausible to argue that the should be easy to
implemenent and do not change the spirit of the project.
So more like 'adding a log function' rather than
something (probably) much bigger like Oracle olap.

It could be further argued that, for all such ideas for
SQLite extentions, the strategic approach of having
'the core sqlite' and 'the extended sqlite' is the
best way to proceed. Any non-conventional extensions
(like log or OLAP) could be implemented in the 'extended
sqlite' version, used from there - and eventually migrated to
the core if there is a sufficient interest.










----- Original Message -----
From: [hidden email]
To: [hidden email]
Subject: Re: [sqlite] windowing functions != recursive functions
Date: Wed, 12 Oct 2005 22:48:35 -0400

>
> "pilot pirx" <[hidden email]> wrote:
>
> > Now, for the recursive function like exponential moving average
> > the defintion is that
> >
> > ema(i+1) =  val(i) * coef  + ema(i) * (1-coef).
> >
> > That is I have to know the previous value of both EMA _and_  
> > VALUE (while for moving avearage I need to know _only_ the
> > previous value(s)
> > of VALUE.
>
> You could write an "ema()" function for SQLite using the
> scarcely documented API functions sqlite3_get_auxdata() and
> sqlite3_set_auxdata().  (Those routines were intended to allow
> functions like "regexp" to compile a constant regular expression
> once and then reused the compiled regular expression on
> subsequent calls.  But they have never been used for anything,
> as far as I am aware.)
>
> The ema() function would work like this:
>
>     SELECT ema(x, 0.45) FROM table1;
>
> Where 0.45 is the "coef".
>
> I was wondering if it would be possible to write a "prev()"
> function that returned the value of a column in the result
> set from the previous row.  prev() could be used to implement
> ema() in pure SQL.  But, alas, I do not see how you could
> write a general-purpose prev() function using the current
> API.  Some kind of extension would be required, I think.
> --
> D. Richard Hipp <[hidden email]>


--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/