I was unaware of the windowing functions discusssion.
Having a look at the first link, it looks like we may be talking about two subtly different issues. The windowing functions described in the link are different from recursive functions. There is no problem in computing moving average or cumulative sum etc with the existing SQL (or dealing with time windows in general) - it just the SQL gets nasty - examples are in 'SQL for smarties' by Joe Celko. Therefore adding such functions to SQLite is 'nice' but does not really increase the functionality. In contrast, computation of the recursion function adds it. example with moving averages, since they are mentioned in the windowing functions assuming original series is VALUE = 1 2 3 4 5 6 7 following statement will compute 5-day moving average for the whole column UPDATE data SET MAVE5 = (SELECT AVG(val) FROM data AS h1 WHERE h1.dayno <= data.dayno AND h1.dayno > data.dayno-05); But this statement operates always on existing data in the existing column - to compute new value of MAVE5 it only needs to know values of VALUE. (When, for the element one, there is not enough data (because there is no dayno < 1) SQL simply averages over existing data.) Now, for the recursive function like exponential moving average the defintion is that ema(i+1) = val(i) * coef + ema(i) * (1-coef). That is I have to know the previous value of both EMA _and_ VALUE (while for moving avearage I need to know _only_ the previous value(s) of VALUE. ----- Original Message ----- From: "Andrew Piskorski" <[hidden email]> To: [hidden email] Subject: [sqlite] SQL Window/OLAP functions Date: Wed, 12 Oct 2005 08:34:02 -0400 > > On Wed, Oct 12, 2005 at 05:12:05AM -0500, pilot pirx wrote: > > Subject: [sqlite] Please, please do _not_ remove this feature from SQLite... > > > While using SQLite for some time (with R package, www.r-project.org) > > I did admire its functionality and speed. Then I did discover a > > hidden SQLite feature of immense usefulness - not available in other > > databases. SQLite can compute Fibonacci numbers! (I will explain why > > Transaction visibility features do vary, although often it doesn't > matter anyway. E.g., here's a dicussion of how (at least as of early > 2004), PostgreSQL's docs were quite confused about certain subtleties, > but what I find interesting, is this was still something that in > practice had never really mattered to the mostly hard-core RDBMS > programmers talking about it in that thread: > > http://openacs.org/forums/message-view?message_id=176198 > > > UPDATE fib SET > > val = (SELECT h1.val FROM fib as h1 where pos = fib.pos - 1) + > > (SELECT h2.val FROM fib as h2 where pos = fib.pos - 2) > > WHERE pos > 2; > > I don't see why this is such a great feature. Without it, worst case, > you could still write a simple little loop which would issue one > update statement for each row, all within a single transaction. No? > > > This is an _immensely_ useful functionality when one needs to > > compute various recursive functions. For example exponential moving > > average, used frequently in financials. Or Kalman filter (and many > > Vastly more useful for moving average and the like would be real > windowing/grouping functions, like Oracle's "analytic" functions. I'm > not thrilled by their particular syntax, but the functionality is > INCREDIBLY useful. (And on the other hand, I haven't thought of any > obviously better syntax, either.) > > Hm, an amendement to the SQL:1999 spec added windowing support, and > SQL:2003 includes that, I think as features T611, "Elementrary OLAP > functions" and T612, "Advanced OLAP functions". Apparently Fred Zemke > of Oracle was the author of that SQL spec, and IBM also supported it, > so the SQL:2003 syntax and behavior is probably very similar (maybe > identical?) to what Oracle 8i, 9i, and 10g and IBM's DB2 already have. > PostgreSQL, as of 8.0, doesn't support it yet. > > http://www.wintercorp.com/rwintercolumns/SQL_99snewolapfunctions.html > http://www.ncb.ernet.in/education/modules/dbms/SQL99/OLAP-99-154r2.pdf > http://www.wiscorp.com/sql/SQL2003Features.pdf > http://troels.arvin.dk/db/rdbms/#select-limit-offset > http://www.postgresql.org/docs/8.0/interactive/features.html > http://en.wikipedia.org/wiki/SQL > http://www.sigmod.org/sigmod/record/issues/0403/E.JimAndrew-standard.pdf > http://www.oracle.com/oramag/oracle/01-jul/o41industry.html > > SQLite basically supports just SQL-92, it doesn't have any of these > newer SQL:1999 or SQL:2003 features, right? > > Using SQLite in conjunction with a powerful statistical data analysis > programming language like R is an excellent example of a use where > windowing functions can be hugely helpful. Unfortunately, I've never > had a compelling need to use SQLite for that, otherwise I'd probably > take a shot at adding support for the SQL:2003 Window/OLAP stuff. :) > > -- > Andrew Piskorski <[hidden email]> > http://www.piskorski.com/ -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm |
"pilot pirx" <[hidden email]> wrote:
> Now, for the recursive function like exponential moving average > the defintion is that > > ema(i+1) = val(i) * coef + ema(i) * (1-coef). > > That is I have to know the previous value of both EMA _and_ VALUE > (while for moving avearage I need to know _only_ the previous value(s) > of VALUE. > You could write an "ema()" function for SQLite using the scarcely documented API functions sqlite3_get_auxdata() and sqlite3_set_auxdata(). (Those routines were intended to allow functions like "regexp" to compile a constant regular expression once and then reused the compiled regular expression on subsequent calls. But they have never been used for anything, as far as I am aware.) The ema() function would work like this: SELECT ema(x, 0.45) FROM table1; Where 0.45 is the "coef". I was wondering if it would be possible to write a "prev()" function that returned the value of a column in the result set from the previous row. prev() could be used to implement ema() in pure SQL. But, alas, I do not see how you could write a general-purpose prev() function using the current API. Some kind of extension would be required, I think. -- D. Richard Hipp <[hidden email]> |
In reply to this post by pilot pirx
On Wed, Oct 12, 2005 at 09:05:26PM -0500, pilot pirx wrote:
> The windowing functions described in the link > are different from recursive functions. Yes, I think you're right. Your EMA example bugged me, so I fooled with it, but I couldn't come up with any way to implement EMA using plain SQL, even with the windowing/OLAP functions. Some explanations of the EMA algorithm are here: http://www.investopedia.com/university/tm/TradingIndicators/TheInsAndOutsOfMovingAverages.asp http://www.ivorix.com/en/products/tech/smooth/ema.html Looks like you don't even need real recursion for EMA (AKA, it is tail-recursive), plain old iteration will do fine, as you can see from the simple C++ code in the 2nd link above. (Yet it can't be done in SQL. Well we all knew that SQL isn't Turing complete, here's a practical consequence of that, I guess.) > Now, for the recursive function like exponential moving average the > defintion is that ema(i+1) = val(i) * coef + ema(i) * (1-coef). That > is I have to know the previous value of both EMA _and_ VALUE (while > for moving avearage I need to know _only_ the previous value(s) of > VALUE. It is quite possible to return previous (lagged) values of multiple columns, NOT just one. You do that with dense_rank, which I talked about a back on May 23 on this email list: I always thought of that feature as "look-aside", as in, "Find the max of foo, but don't give me that, instead look aside and return me the corresponding value of bar next to it instead." select max(playerid) keep (dense_rank last order by sb) as top_stealer from batting The problem is that EMA needs to use the lag of the actual column it's calculating, and there's just no damn way to do that in SQL. Well, I didn't try it, but maybe it would be possible to kludge EMA somehow using either Oracle's "connect by", or DB2's recursive SQL, which sounds possibly and less hideous than connect by: http://www-128.ibm.com/developerworks/db2/library/techarticle/0307steinbach/0307steinbach.html Here's the example I tried for EMA (on Oracle 8.1.7.4): create table atp_ema (symbol varchar2(8) ,day date ,val number); insert into atp_ema values ('a' ,'2005-01-03' ,67); insert into atp_ema values ('a' ,'2005-01-04' ,77); insert into atp_ema values ('a' ,'2005-01-05' ,80); insert into atp_ema values ('a' ,'2005-01-06' ,82); insert into atp_ema values ('a' ,'2005-01-07' ,94); insert into atp_ema values ('a' ,'2005-01-10' ,81); insert into atp_ema values ('a' ,'2005-01-11' ,83); insert into atp_ema values ('b' ,'2005-01-03' ,27); insert into atp_ema values ('b' ,'2005-01-04' ,27); insert into atp_ema values ('b' ,'2005-01-05' ,20); insert into atp_ema values ('b' ,'2005-01-06' ,22); insert into atp_ema values ('b' ,'2005-01-07' ,24); insert into atp_ema values ('b' ,'2005-01-10' ,21); insert into atp_ema values ('b' ,'2005-01-11' ,23); variable coeff number; begin :coeff := 0.66; end; / -- Correct algorithm, but can't run: select v.* ,(ema_term_1 + ema_term_2) as ema ,(avg(val) over (partition by symbol order by day rows between 3 preceding and current row)) as mean_4_day from ( select symbol ,day ,val ,(:coeff * val) as ema_term_1 ,( (1 - :coeff) * -- Of course ema is not defined here, so this fails with -- 'ORA-00904: invalid column name': nvl((lag(ema ,1) over (partition by symbol order by day)) ,val) ) as ema_term_2 from atp_ema ) v order by symbol ,day ; -- Andrew Piskorski <[hidden email]> http://www.piskorski.com/ |
In reply to this post by pilot pirx
On Wed, Oct 12, 2005 at 09:05:26PM -0500, pilot pirx wrote:
> There is no problem in computing moving average > or cumulative sum etc > with the existing SQL (or dealing with time windows > in general) - it just the SQL gets nasty - examples Except that the SQL doesn't "just" get nasty, it can quickly get EXTREMELY nasty, and often has really lousy performance as well. I regularly write (fairly) readable, fast queries using (Oracle's) windowing/OLAP functions which AFAICT, would be either extremely difficult or essentially impossible for me to write in SQL-92, other than, perhaps, by writing some special purpose SQL compiler/translator to do the job. E.g., see this lengthy thread from 2001: http://openacs.org/forums/message-view?message_id=25521 http://openacs.org/forums/message-view?message_id=25559 http://openacs.org/forums/message-view?message_id=25607 That thread dates from before I'd ever used Oracle's windowing functions. Note that first non-OLAP SQL query for doing "time-series" like stuff. It's only moderately complicated, but it is also trying to accomplish something conceptually VERY simple, yet its SQL is NOT simple. Writing more complicated queries in that fashion quickly becomes infeasible. Furthermore, that initial query also took about 3 seconds to run. My second non-OLAP way of doing the same query, arguably somewhat clearer, took 75 seconds. At the end of that same thread, after intervening confusion, Daryl Biberdorf pointed out the OLAP lead() and lag() functions to me. By using lead(), the SQL becomes much easier to write and understand, AND its execution become enormously faster - down to 0.1 seconds or so. -- Andrew Piskorski <[hidden email]> http://www.piskorski.com/ |
In reply to this post by pilot pirx
Thank you and AndrewP for the pointers - very interesting read.
Some clarifications. Coming from my perspective - that of R user, not an API user - it is natural to do more complicated operations in R than to try to write additional functions in C/tcl etc. I do not know how common will such attitude be among SQLite users not using R. (In general R users tend to be statisticians with little knowledge of SQL.) This is why my mail was not an attempt to propose some extensions to core SQLite [it is so good for the intended purpose that adding more stuff can spoil it :-)] Were any additions for computing considered for SQLITE it seems plausible to argue that the should be easy to implemenent and do not change the spirit of the project. So more like 'adding a log function' rather than something (probably) much bigger like Oracle olap. It could be further argued that, for all such ideas for SQLite extentions, the strategic approach of having 'the core sqlite' and 'the extended sqlite' is the best way to proceed. Any non-conventional extensions (like log or OLAP) could be implemented in the 'extended sqlite' version, used from there - and eventually migrated to the core if there is a sufficient interest. ----- Original Message ----- From: [hidden email] To: [hidden email] Subject: Re: [sqlite] windowing functions != recursive functions Date: Wed, 12 Oct 2005 22:48:35 -0400 > > "pilot pirx" <[hidden email]> wrote: > > > Now, for the recursive function like exponential moving average > > the defintion is that > > > > ema(i+1) = val(i) * coef + ema(i) * (1-coef). > > > > That is I have to know the previous value of both EMA _and_ > > VALUE (while for moving avearage I need to know _only_ the > > previous value(s) > > of VALUE. > > You could write an "ema()" function for SQLite using the > scarcely documented API functions sqlite3_get_auxdata() and > sqlite3_set_auxdata(). (Those routines were intended to allow > functions like "regexp" to compile a constant regular expression > once and then reused the compiled regular expression on > subsequent calls. But they have never been used for anything, > as far as I am aware.) > > The ema() function would work like this: > > SELECT ema(x, 0.45) FROM table1; > > Where 0.45 is the "coef". > > I was wondering if it would be possible to write a "prev()" > function that returned the value of a column in the result > set from the previous row. prev() could be used to implement > ema() in pure SQL. But, alas, I do not see how you could > write a general-purpose prev() function using the current > API. Some kind of extension would be required, I think. > -- > D. Richard Hipp <[hidden email]> -- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/ |
Free forum by Nabble | Edit this page |