Custom aggregate functions in Tcl

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom aggregate functions in Tcl

Andy Goth-3
I wish to define custom aggregate functions in Tcl, but this capability is
currently not exposed through the Tcl interface. Thus I am thinking about
how best to add it. Here's a first crack at a design proposal:

Extend the [db function] command to accept an -aggregate switch that makes
the new function be an aggregate function. Otherwise, the function defaults
to being a scalar function.

When an aggregate Tcl function is called from SQL, it is invoked in two
phases: step and final. The step phase receives the input data, and the
final phase produces the result.

During step, the function is invoked once for each row (in the group). For
the first row (in the group), the first argument to the function will be
empty string, and subsequent arguments are the SQL values from the row
being processed. For each additional row (in the group), the first argument
is the Tcl value returned by the prior invocation, and subsequent arguments
are as above.

During final, the function is invoked one last time (at the end of each
group). Its sole argument is the return value of the last step invocation,
or empty string if there were no rows. The return value of the Tcl function
is used as the return value of the SQL function.

If there were no rows and GROUP BY was used, the function is not invoked at
all, because there were no groups.

The Tcl function can tell whether it is in the step or final phase by how
many arguments it receives. If it receives multiple, it is in step. If it
receives only one, it is in final.

Depending on how it is written, the Tcl function may be able to tell if
it's being called for the first time (within the group) by checking if its
first argument is empty string. If non-empty, it definitely has been called
before (within the group). If empty, it is most likely on the first row
and, if in step, may need to initialize.

However, if the aggregate function wishes to disregard some of its input
data, it may choose to return empty string to discard any state data
arising from the current and prior rows (in the group). This will result in
empty string being the first argument to the next invocation. This mode of
operation is ill-advised but not illegal, though maybe some valid use cases
may exist, provided that ORDER BY is being used.

It is an error to pass an aggregate Tcl function zero arguments.

Here's an example that will return the entire Fossil global_config table as
a key/value dict:

db function list -deterministic -aggregate {
    apply {{state args} {
        concat $state $args
    }
}

db onecolumn {
    SELECT list(name, value)
      FROM global_config
     ORDER BY name
}

As for implementation, I think the existing code can be leveraged to a
great extent. Adapt the existing tclSqlFunc() to be the backend to new
tclSqlFuncScalar(), tclSqlFuncStep(), and tclSqlFuncFinal() functions, and
adjust the DB_FUNCTION code to recognize -aggregate and pass different
function pointers accordingly. Use sqlite3_aggregate_context() to hold the
Tcl_Obj * resulting from each invocation, then let it be the first argument
to the next invocation.

I would be happy to implement this myself, since this is functionality I
will be needing soon. Once I'm satisfied with it and have test suite
updates, what's the recommended method for sharing my patch?

Question: does xFinal() get called if an error occurs during (or between)
calling xStep()? Are errors even possible? I'm curious if there's any way
to leak the Tcl_Obj pointed to by the aggregate context.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Richard Hipp-3
On 1/29/19, Andy Goth <[hidden email]> wrote:
>
> Question: does xFinal() get called if an error occurs during (or between)
> calling xStep()? Are errors even possible? I'm curious if there's any way
> to leak the Tcl_Obj pointed to by the aggregate context.

xFinal() gets called by sqlite3_reset() or sqlite3_finalize() if it
hasn't been called already when those routines are first invoked.

You might also what to add the ability to specify xValue() and
xInverse() methods so that you can do window functions in TCL as well.
See https://www.sqlite.org/windowfunctions.html#user_defined_aggregate_window_functions
for additional information.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Andy Goth-3
Oh yeah, I meant to say that I was going to leave window functions for
future expansion. First I need to get more familiar with their use.
Yesterday was my first time implementing an aggregate function, and I need
to work my way up.

On Tue, Jan 29, 2019, 07:46 Richard Hipp <[hidden email] wrote:

> On 1/29/19, Andy Goth <[hidden email]> wrote:
> >
> > Question: does xFinal() get called if an error occurs during (or between)
> > calling xStep()? Are errors even possible? I'm curious if there's any way
> > to leak the Tcl_Obj pointed to by the aggregate context.
>
> xFinal() gets called by sqlite3_reset() or sqlite3_finalize() if it
> hasn't been called already when those routines are first invoked.
>
> You might also what to add the ability to specify xValue() and
> xInverse() methods so that you can do window functions in TCL as well.
> See
> https://www.sqlite.org/windowfunctions.html#user_defined_aggregate_window_functions
> for additional information.
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Gerry Snyder-4
In reply to this post by Andy Goth-3
I hope your work makes into the SQLite source code. It will be useful.

Gerry Snyder

On Tue, Jan 29, 2019 at 12:16 AM Andy Goth <[hidden email]> wrote:

> I wish to define custom aggregate functions in Tcl, but this capability is
> currently not exposed through the Tcl interface. Thus I am thinking about
> how best to add it. Here's a first crack at a design proposal:
>
> Extend the [db function] command to accept an -aggregate switch that makes
> the new function be an aggregate function. Otherwise, the function defaults
> to being a scalar function.
>
> When an aggregate Tcl function is called from SQL, it is invoked in two
> phases: step and final. The step phase receives the input data, and the
> final phase produces the result.
>
> During step, the function is invoked once for each row (in the group). For
> the first row (in the group), the first argument to the function will be
> empty string, and subsequent arguments are the SQL values from the row
> being processed. For each additional row (in the group), the first argument
> is the Tcl value returned by the prior invocation, and subsequent arguments
> are as above.
>
> During final, the function is invoked one last time (at the end of each
> group). Its sole argument is the return value of the last step invocation,
> or empty string if there were no rows. The return value of the Tcl function
> is used as the return value of the SQL function.
>
> If there were no rows and GROUP BY was used, the function is not invoked at
> all, because there were no groups.
>
> The Tcl function can tell whether it is in the step or final phase by how
> many arguments it receives. If it receives multiple, it is in step. If it
> receives only one, it is in final.
>
> Depending on how it is written, the Tcl function may be able to tell if
> it's being called for the first time (within the group) by checking if its
> first argument is empty string. If non-empty, it definitely has been called
> before (within the group). If empty, it is most likely on the first row
> and, if in step, may need to initialize.
>
> However, if the aggregate function wishes to disregard some of its input
> data, it may choose to return empty string to discard any state data
> arising from the current and prior rows (in the group). This will result in
> empty string being the first argument to the next invocation. This mode of
> operation is ill-advised but not illegal, though maybe some valid use cases
> may exist, provided that ORDER BY is being used.
>
> It is an error to pass an aggregate Tcl function zero arguments.
>
> Here's an example that will return the entire Fossil global_config table as
> a key/value dict:
>
> db function list -deterministic -aggregate {
>     apply {{state args} {
>         concat $state $args
>     }
> }
>
> db onecolumn {
>     SELECT list(name, value)
>       FROM global_config
>      ORDER BY name
> }
>
> As for implementation, I think the existing code can be leveraged to a
> great extent. Adapt the existing tclSqlFunc() to be the backend to new
> tclSqlFuncScalar(), tclSqlFuncStep(), and tclSqlFuncFinal() functions, and
> adjust the DB_FUNCTION code to recognize -aggregate and pass different
> function pointers accordingly. Use sqlite3_aggregate_context() to hold the
> Tcl_Obj * resulting from each invocation, then let it be the first argument
> to the next invocation.
>
> I would be happy to implement this myself, since this is functionality I
> will be needing soon. Once I'm satisfied with it and have test suite
> updates, what's the recommended method for sharing my patch?
>
> Question: does xFinal() get called if an error occurs during (or between)
> calling xStep()? Are errors even possible? I'm curious if there's any way
> to leak the Tcl_Obj pointed to by the aggregate context.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Andy Goth-3
In reply to this post by Andy Goth-3
On 1/29/19 1:15 AM, Andy Goth wrote:
> I wish to define custom aggregate functions in Tcl

Initial implementation:

https://chiselapp.com/user/andy/repository/sqlite-andy/info/e0689f05d1f8792d

Sample program, intended to be run from the root of a built SQLite tree:

#!/usr/bin/env tclsh

load .libs/libtclsqlite3.so

sqlite3 db

db function incr -deterministic {

     apply {{arg} {

         incr arg

     }}

}

db function list -deterministic -aggregate {

     apply {{state args} {

         concat $state $args

     }}

}

puts [db eval {

     SELECT list(column1, ''), incr(42)

       FROM (VALUES ('#'), ('#'), (' c '), ('\'), ('\\'), ('{ xx '),
('}'))
}]

db close

# vim: set sts=4 sw=4 tw=80 et ft=tcl:


Here's the output:

{{#} {} # {} { c } {} \\ {} {\\} {} \{\ xx\  {} \} {}} 43


Notice that the first # element is brace-quoted and the second one is
not.  This demonstrates TIP 401 compliance (http://tip.tcl.tk/401) in
recent Tcl.

Before I can call this project finished, I need to update the test suite
and documentation.  But I didn't want to wait for that to release the
code I've already written, so here you go.

As a separate project, I would like to improve the Tcl script dispatch.
For several reasons, safeToUseEvalObjv() doesn't really meet its stated
goal.  I have a lot to say about this but should wait until another time
to go into detail.  Let's finish the aggregate function project first.

Another good follow-on project would be adding support for window
functions.  Before I can take that on, I need to gain some experience
using, let alone writing, window functions.

--
Andy Goth | <andrew.m.goth/at/gmail/dot/com>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Richard Hipp-3
It seems that you distinguish between the xStep and xFinal methods by
the number of argments.  xStep as 1+N argument (where N is the number
of function parameters) and xFinal has just 1.

Dan suggests (and I agree) that this will not extend well to window
functions.  It might be better to have an initial argument that is the
"method" name.  xStep would 2+N arguments where the first argument is
the string "step" and xFinal has 2 arguments where the first argument
is "final".  Then when you go to add the xValue and xInverse routines
for window functions, you will have a convenient way to distinguish
those calls from xStep and xFinal.

On 1/30/19, Andy Goth <[hidden email]> wrote:

> On 1/29/19 1:15 AM, Andy Goth wrote:
>> I wish to define custom aggregate functions in Tcl
>
> Initial implementation:
>
> https://chiselapp.com/user/andy/repository/sqlite-andy/info/e0689f05d1f8792d
>
> Sample program, intended to be run from the root of a built SQLite tree:
>
> #!/usr/bin/env tclsh
>
> load .libs/libtclsqlite3.so
>
> sqlite3 db
>
> db function incr -deterministic {
>
>      apply {{arg} {
>
>          incr arg
>
>      }}
>
> }
>
> db function list -deterministic -aggregate {
>
>      apply {{state args} {
>
>          concat $state $args
>
>      }}
>
> }
>
> puts [db eval {
>
>      SELECT list(column1, ''), incr(42)
>
>        FROM (VALUES ('#'), ('#'), (' c '), ('\'), ('\\'), ('{ xx '),
> ('}'))
> }]
>
> db close
>
> # vim: set sts=4 sw=4 tw=80 et ft=tcl:
>
>
> Here's the output:
>
> {{#} {} # {} { c } {} \\ {} {\\} {} \{\ xx\  {} \} {}} 43
>
>
> Notice that the first # element is brace-quoted and the second one is
> not.  This demonstrates TIP 401 compliance (http://tip.tcl.tk/401) in
> recent Tcl.
>
> Before I can call this project finished, I need to update the test suite
> and documentation.  But I didn't want to wait for that to release the
> code I've already written, so here you go.
>
> As a separate project, I would like to improve the Tcl script dispatch.
> For several reasons, safeToUseEvalObjv() doesn't really meet its stated
> goal.  I have a lot to say about this but should wait until another time
> to go into detail.  Let's finish the aggregate function project first.
>
> Another good follow-on project would be adding support for window
> functions.  Before I can take that on, I need to gain some experience
> using, let alone writing, window functions.
>
> --
> Andy Goth | <andrew.m.goth/at/gmail/dot/com>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Andy Goth-3
On 1/30/19 1:59 PM, Richard Hipp wrote:
> It seems that you distinguish between the xStep and xFinal methods by
> the number of argments.  xStep [has] 1+N argument (where N is the
> number of function parameters) and xFinal has just 1.

Yes.  I was explicit about that in my first email.

> Dan suggests (and I agree) that this will not extend well to window
> functions.

I agree too.  Trouble is, I designed around aggregate functions while
disregarding window functions because of my total lack of experience
with window functions.

> It might be better to have an initial argument that is the
> "method" name.

I did initially consider this but decided it was not necessary for
aggregate functions because final will always have zero arguments
whereas step will never usefully have zero arguments.

> xStep would 2+N arguments where the first argument is the string
> "step" and xFinal has 2 arguments where the first argument is "final".
> Then when you go to add the xValue and xInverse routines for window
> functions, you will have a convenient way to distinguish those calls
> from xStep and xFinal.

Adding an extra initial argument to the aggregate functions is not a
difficult thing to do.  I can certainly add that.  I just didn't see it
as useful because it conveys information redundantly provided by another
path guaranteed to be there.

Again, I agree it leaves room for future expansion, but (1) it's highly
unlikely that the fundamental definition of an aggregate function will
change, and (2) I don't think it's strictly necessary that the same
calling convention be used for aggregate functions and window functions.

My preference would be that all functions, regardless of kind, have the
same convention, but this is not possible because the interface to
scalar functions is set in stone, and it would be useless to insert a
first argument that's always "function".  Therefore, since aggregate
functions must be defined differently than scalar functions, I thought
it would be fine for window functions to be defined differently than
aggregate functions.

The next chance I get (probably tomorrow morning), I'll go ahead and add
"step" or "final" as the initial argument to aggregate functions.  I'll
also lift the prohibition on aggregate functions with no arguments.

All my above reasoning notwithstanding (I just wanted to document why I
took the approach I did), this change does benefit aggregate functions
by making it easier to tie into TclOO and similar Tcl object systems
which use initial arguments as method names.  As for procedures that
don't need this, they are free to ignore the argument.

Old example, works with current code, here for baseline comparison:

db function list -deterministic -aggregate {
     apply {{state args} {
         concat $state $args
     }}
}

New example:

db function list -deterministic -aggregate {
     apply {{method state args} {
         concat $state $args
     }}
}

Alternately:

db function list -deterministic -aggregate {
     apply {{method state args} {
         switch $method {
             step {concat $state $args}
             final {set state}
         }
     }}
}

Or:

namespace eval ListAggregate {
     namespace export step final
     namespace ensemble create
     proc step {state args} {
         concat $state $args
     }
     proc final {state} {
         return $state
     }
}
db function list -deterministic -aggregate ListAggregate

Or:

oo::class create ListAggregate
oo::objdefine ListAggregate {
     method step {state args} {
         concat $state $args
     }
     method final {state} {
         return $state
     }
}
db function list -deterministic -aggregate ListAggregate

Or:

oo::class create ListAggregate {
     method step {state args} {
         concat $state $args
     }
     method final {state} {
         my destroy
         return $state
     }
}
db function list -deterministic -aggregate [ListAggregate new]

Though I do not prefer that final example since it creates heavyweight
infrastructure to hold per-instance internal state, yet doesn't actually
make use of it.  However, this could be useful as an alternative to
storing state data in the return value, making it possible to modify it
in-place without incurring copy-on-write, as documented in the
tclSqlFuncStep() comments.

Overall, my preference is to avoid creating global named objects when
anonymous values will do the job, hence my use of [apply].

--
Andy Goth | <andrew.m.goth/at/gmail/dot/com>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Andy Goth-3
On 1/30/19 3:27 PM, Andy Goth wrote:
> The next chance I get (probably tomorrow morning), I'll go ahead and add
> "step" or "final" as the initial argument to aggregate functions.  I'll
> also lift the prohibition on aggregate functions with no arguments.

This change is now committed.

https://chiselapp.com/user/andy/repository/sqlite-andy/info/4fc35d5e09e2a486

I went with a string table approach that can be later expanded to handle
method names for window functions.

> oo::class create ListAggregate {
>      method step {state args} {
>          concat $state $args
>      }
>      method final {state} {
>          my destroy
>          return $state
>      }
> }
> db function list -deterministic -aggregate [ListAggregate new]

This approach won't work.  ListAggregate needs to be separately
instantiated for every single group, not just once per database
connection.  Instead, something horrible like this is needed:

oo::class create ListAggregate {
     variable state
     method step {args} {
         lappend state {*}$args
     }
     method final {} {
         return $state
     }
}
db function list -deterministic -aggregate {
     apply {{method obj args} {
         if {$obj eq {}} {
             set obj [ListAggregate new]
         }
         set result [$obj $method {*}$args]
         if {$method eq "final"} {
             $obj destroy
             return $result
         } else {
             return $obj
         }
     }}
}

I don't like this one bit.  There's more logic adapting between the
TclOO implementation and the tclsqlite calling convention than there is
actual work being done.  It's much simpler to just store the state data
in the return value than in an object instantiation.  Nevertheless, this
approach may be more appropriate for complex window functions, so I'm
exploring it rather than dismissing it out of hand.

Let's have a choice of calling conventions, so different-sized tasks can
have more infrastructure if they need it and less if they don't.  By
default, use the simple convention I started with, though augmented with
method name, because why not.  But as an alternative, well... start by
looking at this example:

oo::class create ListAggregate {
     variable state
     method step {args} {
         lappend state {*}$args
     }
     method final {} {
         return $state
     }
}
db function list -deterministic -aggregate -class ListAggregate

The addition of the -class switch changes the script to instead be the
name of a TclOO class, to which I will refer as $class:

1. Before the first invocation of the step function (or final function,
if there are no rows), [$class new] is be called.  Its return value is
saved as the object instance name, referred to below as $obj.

2. For each row, [$obj step ?arg ...?] is called.

3. After the last row, [$obj final] is called, and its return value is
used as the return value of the SQL function.

4. To clean up, [$obj destroy] is called.

If an error occurs in step 1, terminate immediately.  If an error occurs
in steps 2 or 3, go straight to step 4.

You may notice TclOO is not actually required.  Other object systems
such as [incr tcl], XOTcl, Snit, and stooop may be used, though possibly
with wrapper shims, but honestly anybody can write commands that behave
like $class and $obj are expected to behave.  Thus, there is no real
dependency on Tcl 8.6.

[$class new] and [$obj destroy] may call user-defined constructors and
destructors, but while this may be useful to the programmer, it is
internal to TclOO and is not SQLite's concern.

To recap, when -class is not used, the procedure is as follows, where
$script is the script argument to [db function]:

1. For each row, [$script step $state ?arg ...?] is called, where $state
is (first row) empty string or (subsequent rows) the return value of the
previous call to [$script step].

2. After the last row, [$script final $state] is called, and its return
value is used as the return value of the SQL function.  $state is the
return value of the last call to [$script step] or empty string if there
were no rows.

Unless there are comments or objections, I'll try adding -class the next
time I get a chance to hack on this.  I think it may be overkill for
aggregate functions but will probably be useful for window functions.

--
Andy Goth | <andrew.m.goth/at/gmail/dot/com>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Andy Goth-3
In reply to this post by Richard Hipp-3
I read some more about window functions and now see more clearly that they
are an extension to aggregate functions. Now I understand why it makes
sense to have a method name for both aggregate and window functions. I'll
also go ahead and put window function support in my code next chance I get,
rather than wait until I'm totally solid on aggregate functions.

On Wed, Jan 30, 2019, 13:59 Richard Hipp <[hidden email] wrote:

> It seems that you distinguish between the xStep and xFinal methods by
> the number of argments.  xStep as 1+N argument (where N is the number
> of function parameters) and xFinal has just 1.
>
> Dan suggests (and I agree) that this will not extend well to window
> functions.  It might be better to have an initial argument that is the
> "method" name.  xStep would 2+N arguments where the first argument is
> the string "step" and xFinal has 2 arguments where the first argument
> is "final".  Then when you go to add the xValue and xInverse routines
> for window functions, you will have a convenient way to distinguish
> those calls from xStep and xFinal.
>
> On 1/30/19, Andy Goth <[hidden email]> wrote:
> > On 1/29/19 1:15 AM, Andy Goth wrote:
> >> I wish to define custom aggregate functions in Tcl
> >
> > Initial implementation:
> >
> >
> https://chiselapp.com/user/andy/repository/sqlite-andy/info/e0689f05d1f8792d
> >
> > Sample program, intended to be run from the root of a built SQLite tree:
> >
> > #!/usr/bin/env tclsh
> >
> > load .libs/libtclsqlite3.so
> >
> > sqlite3 db
> >
> > db function incr -deterministic {
> >
> >      apply {{arg} {
> >
> >          incr arg
> >
> >      }}
> >
> > }
> >
> > db function list -deterministic -aggregate {
> >
> >      apply {{state args} {
> >
> >          concat $state $args
> >
> >      }}
> >
> > }
> >
> > puts [db eval {
> >
> >      SELECT list(column1, ''), incr(42)
> >
> >        FROM (VALUES ('#'), ('#'), (' c '), ('\'), ('\\'), ('{ xx '),
> > ('}'))
> > }]
> >
> > db close
> >
> > # vim: set sts=4 sw=4 tw=80 et ft=tcl:
> >
> >
> > Here's the output:
> >
> > {{#} {} # {} { c } {} \\ {} {\\} {} \{\ xx\  {} \} {}} 43
> >
> >
> > Notice that the first # element is brace-quoted and the second one is
> > not.  This demonstrates TIP 401 compliance (http://tip.tcl.tk/401) in
> > recent Tcl.
> >
> > Before I can call this project finished, I need to update the test suite
> > and documentation.  But I didn't want to wait for that to release the
> > code I've already written, so here you go.
> >
> > As a separate project, I would like to improve the Tcl script dispatch.
> > For several reasons, safeToUseEvalObjv() doesn't really meet its stated
> > goal.  I have a lot to say about this but should wait until another time
> > to go into detail.  Let's finish the aggregate function project first.
> >
> > Another good follow-on project would be adding support for window
> > functions.  Before I can take that on, I need to gain some experience
> > using, let alone writing, window functions.
> >
> > --
> > Andy Goth | <andrew.m.goth/at/gmail/dot/com>
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Custom aggregate functions in Tcl

Andy Goth-3
In reply to this post by Andy Goth-3
I made many updates to my implementation, which can be found here:

https://chiselapp.com/user/andy/repository/sqlite-andy/timeline?r=andygoth-tcl-function
https://chiselapp.com/user/andy/repository/sqlite-andy/artifact?fn=src/tclsqlite.c&ci=andygoth-tcl-function
https://chiselapp.com/user/andy/repository/sqlite-andy/vdiff?from=root:andygoth-tcl-function&to=andygoth-tcl-function

Documentation and test suite still to come.

[db function] now accepts the following switches:

-argcount N: Required argument count
-deterministic: Promise to be consistent
-state: Pass state value as argument after method name (default)
-class: Use an object-oriented approach instead of -state
-initial VAL: Constructor argument(s), defaults to empty
-scalar: Create a scalar function (default)
-aggregate: Create a non-window aggregate function
-window: Create a window-enabled aggregate function

-class and -initial require -window or -aggregate.

First, let me demonstrate the -state (default) calling convention:

db function square -deterministic {
     apply {{arg} {
         expr {$arg * $arg}
     }}
}
db function list -deterministic -window {
     apply {{method state args} {
         switch $method {
             step    {concat $state $args}
             inverse {lrange $state [llength $args] end}
             value   {set state}
         }
     }}
}
db function sum -deterministic -initial 0 -window {
     apply {{method state args} {
         switch $method {
             step        {expr {$state + [lindex $args 0]}}
             inverse     {expr {$state - [lindex $args 0]}}
             new - value {set state}
         }
     }}
}
db eval {
       WITH data (label, number) AS (
     VALUES ('f', -6), ('b', -2), ('a', 1), ('e', 5), ('d', -4), ('c', 3))
     SELECT sum(square(number)) OVER win AS sumsqu
          , list(label, number) OVER win AS list
       FROM data
     WINDOW win AS (ORDER BY label ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
} {
     puts [list $sumsqu $list]
}

5 {a 1 b -2}
14 {a 1 b -2 c 3}
29 {b -2 c 3 d -4}
50 {c 3 d -4 e 5}
77 {d -4 e 5 f -6}
61 {e 5 f -6}

Now, here's the -class calling convention:

oo::class create ListAggregate {
     variable state
     method step    {args} {lappend state {*}$args}
     method inverse {args} {set state [lrange $state [llength $args] end]}
     method value   {}     {return $state}
}
db function list -deterministic -window -class ListAggregate
oo::class create SumAggregate {
     variable state
     constructor    {{i 0}} {set state $i}
     method step    {i}     {set state [expr {$state + $i}]}
     method inverse {i}     {set state [expr {$state - $i}]}
     method value   {}      {return $state}
}
db function sum -deterministic -window -class SumAggregate

The output is the same as with the first syntax example.  Here, I don't
use the -initial switch, instead relying on its empty default.
Furthermore, I rely on the default constructor argument of 0.  With
-initial, I could explicitly set an initial value for the sum.

-initial behaves a little differently between -state and -class modes.
For -state, it gives the argument to the "new" method.  For -class, it
gives the argument *list* to the "new" method.  Hence, when empty or
omitted, for -state mode the "new" method gets empty string as its
argument, whereas for -class mode it gets no arguments.

-initial can be used to implement a function once in Tcl, then
instantiate it in SQLite several times with some varying parameters to
create a family of functions.

You may have noticed that I don't have "final" anymore.  Instead, I
rolled it into "value" and "destroy", the latter of which I haven't
shown in my examples because the default is good.

To see what's going on under the hood, try this:

db function list -deterministic -window -initial {foo bar} -class {
     apply {{new args} {
         puts \[[lrange [info level 0] 2 end]\]
         list apply {{method args} {
             global state
             puts [lrange [info level 0] 2 end]
             switch $method {
                 new     {set state {}}
                 step    {lappend state {*}$args}
                 inverse {set state [lrange $state [llength $args] end]}
                 value   {set state}
                 destroy {unset state}
             }
         }}
     }}
}
db eval {
       WITH data (label, number) AS (
     VALUES ('f', -6), ('b', -2), ('a', 1), ('e', 5), ('d', -4), ('c', 3))
     SELECT label
          , list(label, number) OVER win AS list
       FROM data
     WINDOW win AS (ORDER BY label ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
} {
     puts [list $label $list]
}

The above prints:

[new foo bar]
[step a 1]
[step b -2]
[value]
a {a 1 b -2}
[step c 3]
[value]
b {a 1 b -2 c 3}
[inverse a 1]
[step d -4]
[value]
c {b -2 c 3 d -4}
[inverse b -2]
[step e 5]
[value]
d {c 3 d -4 e 5}
[inverse c 3]
[step f -6]
[value]
e {d -4 e 5 f -6}
[inverse d -4]
[value]
f {e 5 f -6}
[destroy]

Here's the above example, reworked to not use -class:

db function list -deterministic -window -initial {foo bar} {
     apply {{method state args} {
         puts \[[lrange [info level 0] 2 end]\]
         switch $method {
             step    {concat $state $args}
             inverse {lrange $state [llength $args] end}
             value   {set state}
             destroy {unset state}
         }
     }}
}

Used with the prior [db eval] query, the above prints:

[new {foo bar}]
[step {} a 1]
[step {a 1} b -2]
[value {a 1 b -2}]
a {a 1 b -2}
[step {a 1 b -2} c 3]
[value {a 1 b -2 c 3}]
b {a 1 b -2 c 3}
[inverse {a 1 b -2 c 3} a 1]
[step {b -2 c 3} d -4]
[value {b -2 c 3 d -4}]
c {b -2 c 3 d -4}
[inverse {b -2 c 3 d -4} b -2]
[step {c 3 d -4} e 5]
[value {c 3 d -4 e 5}]
d {c 3 d -4 e 5}
[inverse {c 3 d -4 e 5} c 3]
[step {d -4 e 5} f -6]
[value {d -4 e 5 f -6}]
e {d -4 e 5 f -6}
[inverse {d -4 e 5 f -6} d -4]
[value {e 5 f -6}]
f {e 5 f -6}
[destroy {e 5 f -6}]

--
Andy Goth | <andrew.m.goth/at/gmail/dot/com>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users