Feature Request: Binding Arrays

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Feature Request: Binding Arrays

Dominique Devienne
We have an SQLite virtual-table heavy application with a lot of the GUI
driven by SQL queries, and often times we have queries of the form

select * from some_table where some_column in (...)

where ... is coming from prior selections in the GUI, or filtering, etc...

1) In some places, we create temporary tables and join with those, instead
of using the in (list) where clause.
2) In other places we synthesize the query text by splicing list.join(", ")
in the in (list) where clause. (whether you splice the text of the values,
or a series of %i and do proper binding makes little difference IMHO, in
both cases you need to reparse).

Both solutions are unsatisfactory, because with 1) you have to create dummy
transient tables, for which you need to invent table names, insert, join
with, and then delete/cleanup, and 2) constantly reparse and prepare
queries, which can get super long if the array to "bind" is big.

Any chance SQLite would add true array binding?

For example, given

create table t (name text, type text, primary key (name, type));
select * from t where type in (%1);

and binding would look something like this:

sqlite3_bind_array_begin(stmt, 1 [, types.size()]); // size param?
for (const auto& type: types) {
  sqlite3_bind_text(stmt, 1, type.c_str(), type.size(), SQLITE_TRANSIENT);
}
sqlite3_bind_array_end(stmt, 1);

Whether the API allows only homogeneous elements in the array (element type
specified in the sqlite3_bind_array_begin) or it's the usual SQLite duck
typing matters little me.

Obviously I would welcome such a change. I have no clue how difficult to
implement that is of course, but if somehow it could be added, and doesn't
make SQLite that much bigger, then such an addition would be very much
welcome.

If I somehow missed a better work-around to this lack of array-binding, I'm
also interested of course, but obviously I'd prefer real array binding.

Thanks for any insight on this, --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Clemens Ladisch
Dominique Devienne wrote:
> select * from some_table where some_column in (...)
>
> 2) In other places we synthesize the query text by splicing list.join(", ")
> in the in (list) where clause.
>
> Both solutions are unsatisfactory, because ... 2) constantly reparse and
> prepare queries, which can get super long if the array to "bind" is big.
>
> Any chance SQLite would add true array binding?

The compiled statement depends on the number of elements, so SQLite
would have to reprepare anyway:

> .explain on
> explain select 1 in (111,222,333);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
...
8     Integer        111   3     0                    00
9     MakeRecord     3     1     4     b              00
10    IdxInsert      1     4     0                    00
11    Integer        222   3     0                    00
12    MakeRecord     3     1     4     b              00
13    IdxInsert      1     4     0                    00
14    Integer        333   3     0                    00
15    MakeRecord     3     1     4     b              00
16    IdxInsert      1     4     0                    00
...


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Dominique Devienne
On Wed, Oct 16, 2013 at 12:03 PM, Clemens Ladisch <[hidden email]>wrote:

> Dominique Devienne wrote:
> > select * from some_table where some_column in (...)
> >
> > 2) In other places we synthesize the query text by splicing list.join(",
> ")
> > in the in (list) where clause.
> >
> > Both solutions are unsatisfactory, because ... 2) constantly reparse and
> > prepare queries, which can get super long if the array to "bind" is big.
> >
> > Any chance SQLite would add true array binding?
>
> The compiled statement depends on the number of elements, so SQLite
> would have to reprepare anyway:
>

But isn't that a consequence of the fact that a in (list) where clause is
necessary bounded and known at parse time?

The same way I can manually transform the in (list) into a join to a temp
table, so can the query optimizer.

It already uses hidden intermediary result-sets for query processing, and
the array would basically be one such internal (anonymous) hidden
"result-set".

The difference with the manual transform-into-join code I'm forced to do
now is that SQLite wouldn't have to name the table and column to create the
temp table, fill it, using, drop it, etc... These activities trigger
authorizer hooks, trace hooks, change the (temp) schema, etc... (i.e. a
bunch of DDL and DML statements) while SQLite itself, would it support
array binding, would generate none of that monitored statement activity.

Notice that I'm inquiring about array-binding for in (list) only, not for
putting into table cells, not selecting them, not joining on them, etc...
I'd love to be able to do that, but that's a different can of worms
entirely.

Thanks, --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Richard Hipp-3
Please see http://www.sqlite.org/src/artifact/2ece66438?ln=13-75


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Paul van Helden
Fantastic! I've been wanting this for a long time.

Since which version do we have sqlite3_intarray_xxxxx?


On Wed, Oct 16, 2013 at 1:28 PM, Richard Hipp <[hidden email]> wrote:

> Please see http://www.sqlite.org/src/artifact/2ece66438?ln=13-75
>
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Richard Hipp-3
On Wed, Oct 16, 2013 at 7:40 AM, Paul van Helden <[hidden email]>wrote:

> Fantastic! I've been wanting this for a long time.
>
> Since which version do we have sqlite3_intarray_xxxxx?
>
>
Since version 3.6.21, circa 2009-12-07.  Note however that this capability
is not built in.  It is an extension that you need to compile and link
separately.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Paul van Helden
> Since version 3.6.21, circa 2009-12-07.  Note however that this capability
> is not built in.  It is an extension that you need to compile and link
> separately.
>
> OK... Herewith my vote to make it standard then, like
SQLITE_ENABLE_COLUMN_METADATA was enabled for the precompiled binary at
some point.

I prefer to stick to the precompiled binaries. Besides, it would make the
sqlite3_intarray functions more visible in the documentation, etc. I'm sure
I'm not the only one that didn't know about this very useful functionality.

It's about time the binary got slightly bigger ;-)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

techi eth
It is really useful feature.

I have a use case where I need to log the data in continuous interval &
store in database. If array type is supported by sqlite then in single row
I can store data in array of time stamp & array of value.
Is it specific to int type or any other data type can be supported?


On Wed, Oct 16, 2013 at 6:15 PM, Paul van Helden <[hidden email]>wrote:

> > Since version 3.6.21, circa 2009-12-07.  Note however that this
> capability
> > is not built in.  It is an extension that you need to compile and link
> > separately.
> >
> > OK... Herewith my vote to make it standard then, like
> SQLITE_ENABLE_COLUMN_METADATA was enabled for the precompiled binary at
> some point.
>
> I prefer to stick to the precompiled binaries. Besides, it would make the
> sqlite3_intarray functions more visible in the documentation, etc. I'm sure
> I'm not the only one that didn't know about this very useful functionality.
>
> It's about time the binary got slightly bigger ;-)
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Dominique Devienne
In reply to this post by Richard Hipp-3
On Wed, Oct 16, 2013 at 1:28 PM, Richard Hipp <[hidden email]> wrote:

> Please see http://www.sqlite.org/src/artifact/2ece66438?ln=13-75


Thanks. Being familiar with vtables, I had imagined as much, even though
stopped short of doing it in practice.

This takes care of hiding the DML statements from auth/trace hooks, by
inserting/deleting behind the scene in native code.
But it does not hide from SQL the DDL for creating the vtable or dropping
it OTOH. Those will be seen by hooks still.

I realize I'm pushing my luck here Dr Hipp, but thinking about this more,
what I think SQLite is missing is some kind of unprotected
sqlite3_table_value, a subtype of sqlite3_value, with APIs to define the
columns, and fill in the values of a "table value".

Once you have such a beast, you can bind such "table value" for the << in
%1 >> case I was describing above, since there's already
sqlite3_bind_value().

But you can also now create "table functions", i.e. custom SQLite functions
that do not return scalars but anonymous temporary "tables", returning
these sqlite3_table_values via sqlite3_result_value().

vtables can already do pretty much the same thing, except that
* vtables cannot be used "inline" to a given statement (i.e. created on the
fly), and
* vtables cannot be used in an anonymous manner (the vtable must have a
name)
* vtables cannot dynamically process "document cells" that belong to other
tables (real or virtual) in a statement. (you can explicit insert stuff
into them like FTS does, but it's more a custom index than a custom table).

You can easily create a virtual table that parses a comma separated list,
and return one row per string between commans, but you must name the
vtable, and pass it the comma separated list explicitly, "hardcoding" its
rows. But if you make it a table function, you can select from that
function, passing arbitrary strings to parse, each time returning a new
unnamed result-table (i.e. a table), and you're not limited to literals,
you can also "join" to another table to process specific strings (in a
given column) of that other table and have an implicit union-all of those
anonymous per-string-value result-sets.

In pseudo-code, this would look something like this:

sqlite> select * from parse_csv('a, b, a');
a
b
a
sqlite> create table s (name text, csv text);
sqlite> insert into s values ('dec', '1, 2, 3'), ('hex', '1, A'), ('alpha',
'a, B, TT');
sqlite> select s.name, p* from s, parse_csv(s.csv) p;
dec|1
dec|2
dec|3
hex|1
hex|A
alpha|a
alpha|B
alpha|TT
sqlite> select s.name, count(parse_csv(s.csv)) from s;
dec|3
hex|2
alpha|3

With such table functions, you can imagine all sorts of interesting
scenarios, like getting info out of XML or JSON documents stored in table
cells (UnQL anyone?), or getting a list of doubles as rows from a blob cell
value (since SQLite lacks array support, any user-defined-type is basically
a blob or a string that aggregate denormalized info).

OK, it's probably fever-induced wandering. I'll stop there ;) --DD

Oracle XMLTable http://stackoverflow.com/questions/12690868

http://www.oracle-base.com/articles/misc/pipelined-table-functions.php

PS: BTW, this taught me the << in table >> alternative to << in (list) >> I
was not aware of. Thanks for that.
sqlite> create table t (name text, type text);
sqlite> insert into t values ('foo', 'en'), ('bar', 'en');
sqlite> insert into t values ('toto', 'fr'), ('titi', 'fr');
sqlite> insert into t values ('furtch', 'gr');
sqlite> create table type_sel (type text);
sqlite> insert into type_sel values ('gr'), ('fr');
sqlite> select * from t where type in type_sel;
toto|fr
titi|fr
furtch|gr
sqlite> delete from type_sel;
sqlite> insert into type_sel values ('en');
sqlite> select * from t where type in type_sel;
foo|en
bar|en
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Olaf Schmidt-2
In reply to this post by Dominique Devienne
Am 16.10.2013 10:40, schrieb Dominique Devienne:

> If I somehow missed a better work-around to this lack of array-binding, I'm
> also interested of course, but obviously I'd prefer real array binding.

Maybe I'm missing something - but if I'd have a lot of "InApp-
MemAllocations" in the form of different kinds (or "types")
of lists, constantly changing their content as well as their
listcount - but not their "type" - then I'd use a simple set
of userdefined functions (not sure though, how "trigger-happy"
those are with regards to your hooking-worries, when you add
them into the engine - but this needs to happen only once,
at startup).

So, after those functions are in place - where's the problem with:

select * from table where InMySmallUnsortedArrayExists(some_column)
select * from table where InMyLargerSortedArrayExists(some_column)
select * from table where InMyHashListExists(some_column)
...
etc. for trees or whatever you want to use to speed-up the exists-check.

Or more generically with an additional Param:
select * from table where InList(@ListTypeEnmValue, some_column)

Olaf

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Dominique Devienne
On Tue, Oct 29, 2013 at 2:06 AM, Olaf Schmidt <[hidden email]> wrote:

> Am 16.10.2013 10:40, schrieb Dominique Devienne:
> If I somehow missed a better work-around to this lack of array-binding, I'm
>
>> also interested of course, but obviously I'd prefer real array binding.
>>
>
> Maybe I'm missing something - but if I'd have a lot of "InApp-
> MemAllocations" in the form of different kinds (or "types")
> of lists, constantly changing their content as well as their
> listcount - but not their "type" - then I'd use a simple set
> of userdefined functions (not sure though, how "trigger-happy"
> those are with regards to your hooking-worries, when you add
> them into the engine - but this needs to happen only once,
> at startup).
>
> So, after those functions are in place - where's the problem with:
>
> select * from table where InMySmallUnsortedArrayExists(**some_column)
> select * from table where InMyLargerSortedArrayExists(**some_column)
> select * from table where InMyHashListExists(some_**column)
> ...
> etc. for trees or whatever you want to use to speed-up the exists-check.
>
> Or more generically with an additional Param:
> select * from table where InList(@ListTypeEnmValue, some_column)


First off, when you use functions like this, you basically rule out index
use, even if some_column is indexed. That's not good. << WHERE col IN list
>> OTOH, might use an index.

Second, as I mentioned, it's the UI that's SQL-driven. You can have N lists
or tables or combos in various dialogs, all looking at the same underlying
(virtual) table but each will have its own selection, so it's N different
lists that need to be used, where N is not fixed (some dialogs can be
popped up several times too, different instances of the *same* dialog). So
creating a function or a temp table for each just isn't great either, and
forces to name what is inherently "anonymous" IMHO. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Olaf Schmidt-2
Am 29.10.2013 13:19, schrieb Dominique Devienne:

>> So, after those functions are in place - where's the problem with:
>>
>> select * from table where InMySmallUnsortedArrayExists(some_column)
>> select * from table where InMyLargerSortedArrayExists(some_column)
>> select * from table where InMyHashListExists(some_column)
>> ...
>> etc. for trees or whatever you want to use to speed-up the exists-check.
>>
>> Or more generically with an additional Param:
>> select * from table where InList(@ListTypeEnmValue, some_column)
>
>
> First off, when you use functions like this, you basically rule
> out index use, even if some_column is indexed. That's not good.
> << WHERE col IN list>> OTOH, might use an index.

Might - yes, and *if* an index is used for the In-Checks, then
you're perhaps "wasting it" - or it could be the wrong index
which is choosen by the query-optimizer.

The better index (in case you use Tmp-Tables) is not the index
on col of the "real table", but the index on the Tmp-Table-Col.

However, in any case we have an exists-check to perform here,
(for every sqlite3_step) and the only question is, can your own
function perform this check faster than the built-in mechanism
of SQLite (when SQLite performs at its best, using an index which
was defined on the comparelist).

In my tests SQLite cannot outperform a well-implemented
"sorting Dictionary-Class", no matter if an index is in use
or not.

So, I would not dismiss the usage of UDFs in your special case
that fast - the UDF-implementation is dead-easy, more flexible
and with the Dictionary I was using, about 50% faster than
SQLite with pre-indexed Tmp-Tables (about factor 3.5 faster
than what you're currently using with the dynamic List-Joins).


> Second, as I mentioned, it's the UI that's SQL-driven. You can have N lists
> or tables or combos in various dialogs, all looking at the same underlying
> (virtual) table but each will have its own selection, so it's N different
> lists that need to be used, where N is not fixed (some dialogs can be
> popped up several times too, different instances of the *same* dialog). So
> creating a function or a temp table for each just isn't great either, and
> forces to name what is inherently "anonymous" IMHO.
>

For exactly this "anonymous case" I've already proposed:
select * from table where InList(@ListTypeEnmValue, some_column)

Meaning, that you only need this single function instead of the
"more specialized ones" - at the "cost" of setting one additional
parameter - and reacting to that param within your UDF-callback.

You can precompile such kind of statement and use binding-calls,
to replace the @ListTypeEnmValue Param-Slot with an Integer-
value (or even a Pointer-Value) of your choice.

Below is my complete Testcode (sorry, no C-code - but I think you
will get the idea - and will deduce that the CommandObjects are
simply encapsulating the SQLite-Binding-Calls).

Here's the timing-values this Demo puts out (100000 values, split
into two lists: [1 to 40000] and the other one [40001 to 100000]

UDF and Dictionary           103msec     40000     60000
Tmp-Tbls without Index       301msec     40000     60000
Tmp-Tbls with Index          151msec     40000     60000
Joined comma-sep-Lists       358msec     40000     60000

VB6-code:

Option Explicit

Implements IFunction

Enum enmDictType
   DictTypeNone
   Dict40K
   Dict60K
   '...
   DictTypeMax
End Enum

Private Cnn As cConnection, DictArr(DictTypeMax) As cSortedDictionary

Private Sub Form_Click()
Dim i As Long, Res1&, Res2&, Arr1$(1 To 40000), Arr2$(40001 To 100000)
Cls

   Set Cnn = New_c.Connection(, DBCreateInMemory) 'create a new InMem-DB
       Cnn.AddUserDefinedFunction Me 'add the Implementer of the new func
       Cnn.Execute "Create Table T(Col Integer)"

   With Cnn.CreateCommand("Insert Into T Values(?)")
        Cnn.BeginTrans
           For i = 1 To 100000 'add values into T (fast, per Binding-API)
             .SetInt32 1, i: .Execute
           Next
        Cnn.CommitTrans
   End With

   'two Temp-Tables (one for 40K, the other for 60K records)
   Cnn.Execute "Create Temp Table Tmp1(Col Integer)"
   Cnn.Execute "Insert Into Tmp1 Select * From T Where Col<=40000"
   Cnn.Execute "Create Temp Table Tmp2(Col Integer)"
   Cnn.Execute "Insert Into Tmp2 Select * From T Where Col >40000"

   'same preparations for our alternatives to the Tmp-Tables
   Set DictArr(Dict40K) = New cSortedDictionary
   Set DictArr(Dict60K) = New cSortedDictionary
   For i = 1 To 40000
     DictArr(Dict40K).Add i
     Arr1(i) = i
   Next
   For i = 40001 To 100000
     DictArr(Dict60K).Add i
     Arr2(i) = i
   Next

   'use a static precompiled query with a UDF (just different Params)
   New_c.Timing True
     Res1 = DoQuery(Dict40K)
     Res2 = DoQuery(Dict60K)
   Print "UDF and Dictionary", New_c.Timing, Res1, Res2

   New_c.Timing True
     Res1 = GetCount("Select Count(*) from T Where Col in Tmp1")
     Res2 = GetCount("Select Count(*) from T Where Col in Tmp2")
   Print "Tmp-Tbls without Index", New_c.Timing, Res1, Res2

   Cnn.Execute "Create Index idx_Tmp1_Col On Tmp1(Col)"
   Cnn.Execute "Create Index idx_Tmp2_Col On Tmp2(Col)"
   New_c.Timing True
     Res1 = GetCount("Select Count(*) from T Where Col in Tmp1")
     Res2 = GetCount("Select Count(*) from T Where Col in Tmp2")
   Print "Tmp-Tbls with Index", New_c.Timing, Res1, Res2

   Dim CSL1 As String: CSL1 = "(" & Join$(Arr1, ",") & ")"
   Dim CSL2 As String: CSL2 = "(" & Join$(Arr2, ",") & ")"
   New_c.Timing True
     Res1 = GetCount("Select Count(*) from T Where Col in " & CSL1)
     Res2 = GetCount("Select Count(*) from T Where Col in " & CSL2)
   Print "Joined comma-sep-Lists", New_c.Timing, Res1, Res2
End Sub

Private Function GetCount(SQL As String) As Long
   GetCount = Cnn.OpenRecordset(SQL)(0)
End Function

Private Function DoQuery(DictType As enmDictType) As Long
Static Query As cSelectCommand
   If Query Is Nothing Then Set Query = Cnn.CreateSelectCommand( _
              "Select Count(*) from T Where InDict(@DictType, Col)")
   With Query
     .SetInt32 !DictType, DictType
     DoQuery = .Execute()(0).Value
   End With
End Function

'Implementation of the IFunction-Interface, to support InDict(P1, P2)
Property Get IFunction_DefinedNames() As String
   IFunction_DefinedNames = "InDict"
End Property

Sub IFunction_Callback(ByVal ZeroBasedNameIndex As Long, _
                        ByVal ParamCount As Long, UDF As cUDFMethods)

Dim DictType As enmDictType: DictType = UDF.GetInt32(1)
Dim TheValue As Long:        TheValue = UDF.GetInt32(2)

   UDF.SetResultInt32 DictArr(DictType).Exists(TheValue)
End Sub

Olaf

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Dominique Devienne
On Thu, Oct 31, 2013 at 2:45 AM, Olaf Schmidt <[hidden email]> wrote:

> Am 29.10.2013 13:19, schrieb Dominique Devienne:
>
>> [...]
>> First off, when you use functions like this, you basically rule
>> out index use, even if some_column is indexed. That's not good.
>> << WHERE col IN list>> OTOH, might use an index.
>>
>
> Might - yes, and *if* an index is used for the In-Checks, then
> you're perhaps "wasting it" - or it could be the wrong index
> which is chosen by the query-optimizer.
>
> The better index (in case you use Tmp-Tables) is not the index
> on col of the "real table", but the index on the Tmp-Table-Col.
>

I'm not convinced by this. The "real table" can be quite large, several
100's to 100,000's rows (up to 1+ million rows) and col can be the primary
key, or a non-unique "parent" key where many parent keys have about 10 rows
each, and a few have in the 1000's, while the in-list could very small
(down to just 1 element) or quite large (several thousands).

With a function based approach, you are *always* full-scanning the whole
"real" table, no matter the cardinality of the InList operand, and even
with a very fast InList function, this is not going to beat getting 10 PK
rows, or 10 "parent" key rows (e.g. 100 rows to 10,000 rows) via indexes,
especially since these are virtual tables with Boost.MultiIndex unique or
non-unique indexes (i.e. 5x to 10x faster than SQLite's paged B-tree
indexes). It might well beat it if the InList operand cardinality is high,
as in your 40K and 60K testing in a 100K rows table, because an InList
that's 40% or 60% of the whole table is close enough to a full scan that
using a native code set or map test similarly outperforms SQLite's generic
paged B-tree indexes like our Boost.MultiIndex-based indexes.

Of course that's speculation on my part, versus your timed experimentation,
so could well be that I'm wrong. And I'll need to look into this eventually.

Plus I haven't looked at the stat tables the new query optimizer is
increasingly using to find the best plan, to put information in there for
the cardinality of our vtables and our "selection" tmp-tables, so SQLite
has enough info to do its planning. Heck when I'm mixing vtable index costs
and real (tmp) table index costs, I have no clue the costs am I returning
are compatible. That's an area that's not well covered by the doc IMHO,
which I haven't explored enough. So as of now it's possible SQLite would
never select a plan that privileges a PK or non-unique index access on the
"real" table.

In any case, thank you for your persistence and challenging my assumptions.
Your experiments are very interesting, and I'll try to report back in this
thread any of my own findings in light of the information we've provided.
Thanks a bunch Olaf.

Cheers, --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Nico Williams
In reply to this post by Richard Hipp-3
On Wed, Oct 16, 2013 at 07:28:04AM -0400, Richard Hipp wrote:
> Please see http://www.sqlite.org/src/artifact/2ece66438?ln=13-75

I've been thinking for a while -ever since I happened upon jq(1)- that a
marriage of jq and SQLite3 would be wonderful.

jq is a JSON query language.  It's a functional language.

In my mind this would consist of:

 - a jq function for sqlite3
 - a jq array grouping aggregate function for sqlite3
 - a jq virtual table for sqlite3 (for, e.g., disaggregating values)
 - a jq binding for sqlite3 (so SQLite3 can be invoked from jq)

The IN array binding could then be handled like this:

sqlite3> SELECT * FROM foo WHERE jq('contains($arg1), :in_list, column1);

The value bound to :in_list would be a JSON array or object (faster for
larger sets) of values.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Nico Williams
Oh, and jq is at: https://stedolan.github.io/jq
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

decalek
In reply to this post by Nico Williams
On 31.10.2013 18:37, Nico Williams wrote:

> On Wed, Oct 16, 2013 at 07:28:04AM -0400, Richard Hipp wrote:
>> Please see http://www.sqlite.org/src/artifact/2ece66438?ln=13-75
>
> I've been thinking for a while -ever since I happened upon jq(1)- that a
> marriage of jq and SQLite3 would be wonderful.
>
> jq is a JSON query language.  It's a functional language.
>
> In my mind this would consist of:
>
>   - a jq function for sqlite3
>   - a jq array grouping aggregate function for sqlite3
>   - a jq virtual table for sqlite3 (for, e.g., disaggregating values)
>   - a jq binding for sqlite3 (so SQLite3 can be invoked from jq)
>
> The IN array binding could then be handled like this:
>
> sqlite3> SELECT * FROM foo WHERE jq('contains($arg1), :in_list, column1);
>
> The value bound to :in_list would be a JSON array or object (faster for
> larger sets) of values.

I am sure, there are many SQLite users waiting with hope :-) for an
extension handling semi-structured data.

BTW, I think some functionality are already online trough
libspatialite's VirtualXPath virtual table [1].

Might be some code reuse could be possible for the JSON case.

Are there enough interest for something like informal SIG about
Tree/Graph data processing in SQLite?

Kind Regards,
Alek

[1]
https://www.gaia-gis.it/fossil/libspatialite/wiki?name=VirtualXPath-intro

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

decalek
In reply to this post by Dominique Devienne
Hi Dominique,

On 16.10.2013 11:40, Dominique Devienne wrote:
> We have an SQLite virtual-table heavy application with a lot of the GUI
> driven by SQL queries, and often times we have queries of the form
>

...

>
> create table t (name text, type text, primary key (name, type));
> select * from t where type in (%1);
>

...

>
> If I somehow missed a better work-around to this lack of array-binding, I'm
> also interested of course, but obviously I'd prefer real array binding.
>

I am thinking about a sort of workaround:

attach ':memory:' as gui

create table gui.node(node int primary key, parent int, value);
/* Model for data binding elements with single value - one tree per
widget */

create table gui.node_value(node int, typecode int, value);
/* Model for widget elements with multiple values */

create table gui.widget_binding(widget primary key, node int);
/* Current widget binding */

Or more direct alternative:

create table gui.t_based_combo(widget int primary key, label, type text);

Let see the later (for the sake of simplicity)

Variant 1: Ideally you are able to rebind your widgetkit to the inmemory
gui DB (replacing your current memory containers). Then we have:

- Populating the widget data: insert into gui.t_based_combo select
$widget, ....
- Destroying the widget: delete from gui.t_based_combo where widget =
$widget
- Using widget: select * from t where type in (select type from
gui.t_based_combo where widget = $widget)

Variant 2: You are not able (to rebind): Basically the same as Variant
1, but you have to inject triggers in your memory containers to keep
them in sync with the gui DB. In this case probably the more general
model scheme (the first one - "node" tree) will be appropriate, because
you will likely implement the triggers in some base widget class.

After reading the whole tread I suspect that you have already considered
the whole thing about the :memory: DB bridging the GUI with the real DB
but I am curious why?

Kind Regards,
Alek


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

decalek
On 01.11.2013 22:04, Alek Paunov wrote:
> After reading the whole tread I suspect that you have already considered
> the whole thing about the :memory: DB bridging the GUI with the real DB
> but I am curious why?

Sorry - unfinished sentence: ... why you have rejected this approach?

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature Request: Binding Arrays

Olaf Schmidt-2
In reply to this post by Dominique Devienne
Am 31.10.2013 14:09, schrieb Dominique Devienne:

[Userdefined functions in conjunction with fast Exists-checks
in "Userland" - vs. SQLites built-in indexing in case of In (List)]

> I'm not convinced by this. The "real table" can be quite large, several
> 100's to 100,000's rows (up to 1+ million rows) and col can be the primary
> key, or a non-unique "parent" key where many parent keys have about 10 rows
> each, and a few have in the 1000's, while the in-list could very small
> (down to just 1 element) or quite large (several thousands).
>
> With a function based approach, you are *always* full-scanning the whole
> "real" table, no matter the cardinality of the InList operand, and even
> with a very fast InList function, this is not going to beat getting 10 PK
> rows, or 10 "parent" key rows (e.g. 100 rows to 10,000 rows) via indexes,
> especially since these are virtual tables with Boost.MultiIndex unique or
> non-unique indexes (i.e. 5x to 10x faster than SQLite's paged B-tree
> indexes). It might well beat it if the InList operand cardinality is high,
> as in your 40K and 60K testing in a 100K rows table, because an InList
> that's 40% or 60% of the whole table is close enough to a full scan that
> using a native code set or map test similarly outperforms SQLite's generic
> paged B-tree indexes like our Boost.MultiIndex-based indexes.
>
> Of course that's speculation on my part, versus your timed experimentation,
> so could well be that I'm wrong. And I'll need to look into this eventually.
>

You're not wrong - although the UDF-timings in my previous post are
correct - it is true that they will remain (relatively) constant -
even in case we reduce the Count in the CompareList from 40000 to
1000 or 100.

All timings with 100000 records in the "real" table - FullTable-scan
due to using an UDF with a sorting-Dictionary-instance:
36msec (100 items in the compare-list)
43msec (1000 items in the compare-list)
48msec (10000 items in the compare-list)
52msec (40000 items in the compare-list)

The above was no surprise to me, because I'd expected that due to the
FullTable-scans in case of the UDF-approach... what came as a surprise
was the kind of "inverse-lookup" the SQLite-optimizer apparently
performs, when an index exists on the "real" table which provides
the Column-value to compare against the "In"-list.

In my large compare-lists (40000 and 60000) this behaviour didn't
become obvious in the timings whilst with 100 and 1000 items in the
compare-lists there was clearly a difference.

Again, all timings with 100000 records in the "real" table -
the compare-list created beforehand in a tmp-table -
the table- and index-creation not included in the timings
SQL: Select Count(*) from T Where Col in Tmp

No indexes in the whole setup (not on the "real" table T and also
not on the Tmp-Table):
37msec (100 items in the compare-list)
47msec (1000 items in the compare-list)
84msec (10000 items in the compare-list)
136msec (40000 items in the compare-list)

With only an index on the Tmp-Table-Column:
37msec (100 items in the compare-list)
56msec (1000 items in the compare-list)..triple-checked, not an outlier
65msec (10000 items in the compare-list)
77msec (40000 items in the compare-list)

With only an index on the real table (on the compare-value-column):
0.4msec (100 items in the compare-list)
1.9msec (1000 items in the compare-list)
26msec (10000 items in the compare-list)
116msec (40000 items in the compare-list)

With both indexes (on the real table and the tmp-table):
Identical timings to the case above - apparently the index on
the real table was choosen in favour of the tmp-table-index -
which is the correct choice of the optimizer for all compare-list-counts
below 30000 or so (since with 40000 the index
on the tmp-table performs clearly faster already).

So, my mistake was to choose too large compare-list-counts in
my first test-setup - otherwise it would have become obvious
that indexes on the original "real" table are indeed worthwhile.

This holds true for compare-listcounts smaller than about
a third of the total records in the original table.
An index on the Tmp-Table which holds the compare-list is
apparently only worthwhile above this compare-count.

The timings against a 100000-records-table in a fulltable-
scan with the UDF (here again - this was on a intel i5 2.8GHz):
36msec (100 items in the compare-list)
43msec (1000 items in the compare-list)
48msec (10000 items in the compare-list)
52msec (40000 items in the compare-list)

are not that bad - and I'd guess (since the COM-SQLite-wrapper
I've used has some more overhead due to the Interface-delegation)
that there's perhaps 5msec to subtract compared with C/C++ UDFs -
and I can also imagine, that a nice Boost-Object can also out-
perform the SortingDictionary I've used (perhaps by 20-40% or so).

So, in a C/C++ setup I'd expect these values for a UDF
with a Boost-object for the exists-checks (rough estimate):
21msec (100 items in the compare-list)
26msec (1000 items in the compare-list)
30msec (10000 items in the compare-list)
34msec (40000 items in the compare-list)

Maybe all these values provide an insight also for others - in what
"regions" the SQLite-In-List functionality (roughly) operates
timing-wise (under somewhat idealized conditions).


Olaf

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Bug Report - Analyzer app

R Smith
Hi there,

I use the sqlite3_analyzer.exe app get some data about tables (It's very useful by the way - thanks.)

The newest version downloaded some days ago gave me import errors for the produced file, so I tried loading it manually, and then
got the SQL which follows for one of my small DB files. The Object names in the Values seem to be unquoted, which is causing the
failure for me:

/*
----  Removed lots of non-importing stats above this line  ----
*******************************************************************************
The entire text of this report can be sourced into any SQL database
engine for further analysis.  All of the text above is an SQL comment.
The data used to generate this report follows:
*/
BEGIN;
CREATE TABLE space_used(
    name clob,        -- Name of a table or index in the database file
    tblname clob,     -- Name of associated table
    is_index boolean, -- TRUE if it is an index, false for a table
    nentry int,       -- Number of entries in the BTree
    leaf_entries int, -- Number of leaf entries
    payload int,      -- Total amount of data stored in this table or index
    ovfl_payload int, -- Total amount of data stored on overflow pages
    ovfl_cnt int,     -- Number of entries that use overflow
    mx_payload int,   -- Maximum payload size
    int_pages int,    -- Number of interior pages used
    leaf_pages int,   -- Number of leaf pages used
    ovfl_pages int,   -- Number of overflow pages used
    int_unused int,   -- Number of unused bytes on interior pages
    leaf_unused int,  -- Number of unused bytes on primary pages
    ovfl_unused int,  -- Number of unused bytes on overflow pages
    gap_cnt int,      -- Number of gaps in the page layout
    compressed_size int  -- Total bytes stored on disk
);
INSERT INTO space_used VALUES(sqlite_master,sqlite_master,0,14,12,2473,0,0,534,1,3,0,898,520,0,3,4096);
INSERT INTO space_used VALUES(DBases,DBases,0,5,5,644,0,0,181,0,1,0,0,349,0,0,1024);
INSERT INTO space_used VALUES(sqlite_autoindex_DBases_1,DBases,1,5,5,64,0,0,14,0,1,0,0,937,0,0,1024);
INSERT INTO space_used VALUES(DBTables,DBTables,0,15,10,4458,0,0,923,1,6,0,977,1589,0,1,7168);
INSERT INTO space_used VALUES(DBHistory,DBHistory,0,0,0,0,0,0,0,0,1,0,0,1016,0,0,1024);
INSERT INTO space_used VALUES(DBSettings,DBSettings,0,7,7,266,0,0,98,0,1,0,0,715,0,0,1024);
INSERT INTO space_used VALUES(sqlite_autoindex_DBSettings_1,DBSettings,1,7,7,139,0,0,24,0,1,0,0,856,0,0,1024);
INSERT INTO space_used VALUES(Idx_DBases_DBName,DBases,1,5,5,64,0,0,14,0,1,0,0,937,0,0,1024);
INSERT INTO space_used VALUES(Idx_DBTables_DBID,DBTables,1,10,10,145,0,0,22,0,1,0,0,841,0,0,1024);
INSERT INTO space_used VALUES(Idx_DBHistory,DBHistory,1,0,0,0,0,0,0,0,1,0,0,1016,0,0,1024);
COMMIT;

---------------------------------------------------------------------------

Importing this gets the usual "no such column: sqlite_master" error, but it all works well if I go add quotes everywhere.
If it is already fixed, or if there is something I'm doing wrong, kindly point me in the right direction.
Thank you


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
12