Feature request: extend the IN operator

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Feature request: extend the IN operator

Alexey Pechnikov-2
Is it possible to support construction like to

where x IN "1 2 3"

There are a lot of situations when is very useful to store list of
identifiers in table field.
I know about the intarray virtual table but it's need
constructor/destructor calls (and
produce segfaults with the autorizer function).

Does somebody interesting in this functionality too?

--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Max Vlasov
On Sun, Aug 7, 2011 at 3:17 PM, Alexey Pechnikov <[hidden email]>wrote:

> Is it possible to support construction like to
>
> where x IN "1 2 3"
> ...
>
> Does somebody interesting in this functionality too?
>
>

I needed one, I just implemented mysql find_in_set as a user function. there
it was comma-delimited, but it's not that important, I'm sure you can change
it to any other symbol in your data

Max
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Stephan Beal-3
In reply to this post by Alexey Pechnikov-2
On Sun, Aug 7, 2011 at 1:17 PM, Alexey Pechnikov <[hidden email]>wrote:

> There are a lot of situations when is very useful to store list of
> identifiers in table field.
>

The problem with the concept of "list as a string" is that it implies a
specific token separator, and there is no single universal solution to that
problem. As soon as someone implements the example you've shown, someone
else will say, "but i have a semicolon-separated list..." Ad nauseum.

Also keep in mind that sqlite3 tries, to a large degree, to be compatible
with ANSI SQL, and (IN "A B C"), in the form you describe, is not
ANSI-specified.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Simon Slavin-3
In reply to this post by Alexey Pechnikov-2

On 7 Aug 2011, at 12:17pm, Alexey Pechnikov wrote:

> Is it possible to support construction like to
>
> where x IN "1 2 3"

How does this differ on the IN operator ?

<http://www.sqlite.org/lang_expr.html#in_op>

For example,

SELECT * FROM cars WHERE cars.name IN ('Corolla', 'Fiesta')

> There are a lot of situations when is very useful to store list of
> identifiers in table field.

SQLite supports this:

SELECT * FROM cars WHERE cars.name IN (SELECT names FROM models WHERE type='obsolete')

or even

SELECT * FROM cars WHERE cars.name IN models.names

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Alexey Pechnikov-2
2011/8/7 Simon Slavin <[hidden email]>:
> For example,
>
> SELECT * FROM cars WHERE cars.name IN ('Corolla', 'Fiesta')

create table t (names TEXT);
insert into t(name) values ('Corolla Fiesta');
SELECT * FROM cars WHERE cars.name IN (select names from t where rowid=1);


--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Alexey Pechnikov-2
In reply to this post by Stephan Beal-3
2011/8/7 Stephan Beal <[hidden email]>:

> Also keep in mind that sqlite3 tries, to a large degree, to be compatible
> with ANSI SQL, and (IN "A B C"), in the form you describe, is not
> ANSI-specified.

Yes, but in PostgreSQL (as example) we can create user-defined
function returns table
from the list. SQLite can't do it and is impossible to use user extension too.

P.S. Example for PostgreSQL:

select list2items('a b c');

CREATE OR REPLACE FUNCTION public.list2items(in_list text)
  RETURNS SETOF text AS
$BODY$

    SELECT lindex($1, s) FROM generate_series(0,llength($1)-1) AS s;

$BODY$
  LANGUAGE 'sql' IMMUTABLE;

CREATE OR REPLACE FUNCTION public.lindex(list text, index int4)
  RETURNS text AS
$BODY$

  return [lindex $1 $2]

$BODY$
  LANGUAGE 'pltcl' IMMUTABLE;

CREATE OR REPLACE FUNCTION public.llength(list text)
  RETURNS int4 AS
$BODY$

  return [llength $1]

$BODY$
  LANGUAGE 'pltcl' IMMUTABLE;

--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Simon Slavin-3

On 7 Aug 2011, at 3:32pm, Alexey Pechnikov wrote:

> 2011/8/7 Stephan Beal <[hidden email]>:
>
>> Also keep in mind that sqlite3 tries, to a large degree, to be compatible
>> with ANSI SQL, and (IN "A B C"), in the form you describe, is not
>> ANSI-specified.
>
> Yes, but in PostgreSQL (as example) we can create user-defined
> function returns table
> from the list. SQLite can't do it and is impossible to use user extension too.

You don't need to.  The SQLite expressions I listed tell you how to achieve the result without doing that.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Alexey Pechnikov-2
2011/8/7 Simon Slavin <[hidden email]>:
> You don't need to.  The SQLite expressions I listed tell you how to achieve the result without doing that.

Really? And how can you perform the query like to:

sqlite> create table t1(ids text);
sqlite> insert into t1 (ids) values ('1 2 3');
sqlite> insert into t1 (ids) values ('2 3 4');
sqlite> insert into t1 (ids) values ('3 4 5');
sqlite> create table t2(name text);
sqlite> insert into t2 (name) values ('name1');
sqlite> insert into t2 (name) values ('name2');
sqlite> insert into t2 (name) values ('name3');
sqlite> insert into t2 (name) values ('name4');
sqlite> insert into t2 (name) values ('name5');
sqlite> select * from t2 where rowid in (select ids from t1 where rowid=2);

A simple calculation: if each list of identifiers have about 1000
items and there are
1 000 000 lists than the table of relations (t1.rowid, t2.rowid) will
have 1 000 000 000
rows! It's too slow and is not useful in real world. Of cource all
systems store lists of
identifiers in similar situations.

--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Igor Tandetnik
Alexey Pechnikov <[hidden email]> wrote:
> 2011/8/7 Simon Slavin <[hidden email]>:
>> You don't need to. The SQLite expressions I listed tell you how to achieve the result without doing that.
>
> Really? And how can you perform the query like to:
>
> sqlite> create table t1(ids text);
> sqlite> insert into t1 (ids) values ('1 2 3');
> sqlite> insert into t1 (ids) values ('2 3 4');
> sqlite> insert into t1 (ids) values ('3 4 5');

I can normalize this table, then use joins.

> A simple calculation: if each list of identifiers have about 1000
> items and there are
> 1 000 000 lists than the table of relations (t1.rowid, t2.rowid) will
> have 1 000 000 000
> rows!

One way or the other, you need to store 1,000,000,000 pieces of information. Why is it that storing them in 1,000,000 rows holding 1000 items each is unremarkable, but storing them in 1,000,000,000 rows holding one item each is exclamation point-worthy?

If reducing the number of rows is your ultimate goal, why don't you create a table with one row, holding the whole data structure encoded into one huge string or blob? That'll best optimize the one metric you seem to believe matters the most.

> It's too slow

... when compared to what alternative? Linearly scanning all those lists?

> Of cource all
> systems store lists of
> identifiers in similar situations.

I find it hard to believe that every single system does - surely systems exist that do not denormalize their data this way. In fact, I doubt the design you describe is common, let alone universally accepted.
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Alexey Pechnikov-2
Igor, how about simple test? Normalized database is big and slow.

$ time ./test_norm_idx.tcl
real 32m54.978s
user 32m14.885s
sys 0m39.842s

$ time ./test_idx.tcl
real 7m19.005s
user 6m55.226s
sys 0m11.717s

$ ls -lh *db
2,1G test_idx.db
7,1G test_norm_idx.db


test_idx.tcl
=============================
#!/usr/bin/tclsh8.5
package require sqlite3

sqlite3 db [string map {.tcl .db} $argv0]
db eval {PRAGMA page_size=8192}

db eval {
create table parent(dt DATETIME, user_id INTEGER);
create index parent_dt_idx on parent(dt);
create index parent_user_id_idx on parent(user_id);
create virtual table parent_fts using fts4(childs TEXT);}
db transaction {
    for {set i 1} {$i<=100000} {incr i} {
        set time [clock microseconds]
        set childs ""
        for {set j [expr {$i*1000}]} {$j<=[expr {$i*1000+1000}]} {incr j} {
            lappend childs $j
        }
        db eval {insert into parent(dt, user_id) values ($time, 1)}
        db eval {insert into parent_fts(childs) values ($childs)}
    }
}


test_norm_idx.tcl
=============================
#!/usr/bin/tclsh8.5
package require sqlite3

# test normalized
sqlite3 db [string map {.tcl .db} $argv0]
db eval {PRAGMA page_size=8192}
db eval {create table link(dt DATETIME, user_id INTEGER, parent_id
INTEGER, child_id INTEGER);
create index link_child_id_idx on link(child_id);
create index link_dt_idx on link(dt);
create index link_user_id_idx on link(user_id);}
db transaction {
    for {set i 1} {$i<=100000000} {incr i} {
        set time [clock microseconds]
        db eval {insert into link(dt, user_id, parent_id, child_id)
values ($time, 1, $i%1000, $i)}
    }
}


P.S. With versioning of all records we need some additional fields and
normalized database
is very big and very slow.

--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Puneet Kishor-2
In reply to this post by Igor Tandetnik
On Mon, Aug 8, 2011 at 2:19 AM, Igor Tandetnik <[hidden email]> wrote:

> Alexey Pechnikov <[hidden email]> wrote:
>> 2011/8/7 Simon Slavin <[hidden email]>:
>>> You don't need to. The SQLite expressions I listed tell you how to achieve the result without doing that.
>>
>> Really? And how can you perform the query like to:
>>
>> sqlite> create table t1(ids text);
>> sqlite> insert into t1 (ids) values ('1 2 3');
>> sqlite> insert into t1 (ids) values ('2 3 4');
>> sqlite> insert into t1 (ids) values ('3 4 5');
>
> I can normalize this table, then use joins.
>
>> A simple calculation: if each list of identifiers have about 1000
>> items and there are
>> 1 000 000 lists than the table of relations (t1.rowid, t2.rowid) will
>> have 1 000 000 000
>> rows!
>
> One way or the other, you need to store 1,000,000,000 pieces of information. Why is it that storing them in 1,000,000 rows holding 1000 items each is unremarkable, but storing them in 1,000,000,000 rows holding one item each is exclamation point-worthy?
>


I will let you heavyweights duke it out, but re. the above point,
SQLite (and more databases) have a per row system overhead that can
very quickly overweight the actual data if the data are too granular
with each row storing just a tiny amount. For example, the overhead
for the cells in a raster dataset, if stored one per row, will quickly
surpass the size of the actual data.


> If reducing the number of rows is your ultimate goal, why don't you create a table with one row, holding the whole data structure encoded into one huge string or blob? That'll best optimize the one metric you seem to believe matters the most.
>
>> It's too slow
>
> ... when compared to what alternative? Linearly scanning all those lists?
>
>> Of cource all
>> systems store lists of
>> identifiers in similar situations.
>
> I find it hard to believe that every single system does - surely systems exist that do not denormalize their data this way. In fact, I doubt the design you describe is common, let alone universally accepted.
> --
> Igor Tandetnik
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Simon Slavin-3

On 7 Aug 2011, at 10:07pm, P Kishor wrote:

> I will let you heavyweights duke it out, but re. the above point,
> SQLite (and more databases) have a per row system overhead that can
> very quickly overweight the actual data if the data are too granular
> with each row storing just a tiny amount. For example, the overhead
> for the cells in a raster dataset, if stored one per row, will quickly
> surpass the size of the actual data.

You're quite right.  If someone was going to write that functionality into a SQLite app in real life they'd probably use 'LIKE' or 'GLOB' and store all the possibilities in one row.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

E.Pasma
In reply to this post by Alexey Pechnikov-2
On 7-aug-2011, om 13:17, Alexey Pechnikov wrote:

> Is it possible to support construction like to
>
> where x IN "1 2 3"
>
> There are a lot of situations when is very useful to store list of
> identifiers in table field.
> I know about the intarray virtual table but it's need
> constructor/destructor calls (and
> produce segfaults with the autorizer function).
>
> Does somebody interesting in this functionality too?
>
> --  
> Best regards, Alexey Pechnikov.
> http://pechnikov.tel/

Hello,

I would be interested in this functionality too, or more generalized  
in an inverse function for group_concat. Earlier, group_split was  
proposed. For SQLite I imagine this to become a virtual table. I  
created one, through APSW, with the following schema:

     group_split (line, words)

(when I use .schema group_split in the APSW shell, it  only display  
the module name)

The dialog below shows its usage. I'm not completely happy about it  
yet. The predefined column names (line and words) seem artificial. It  
lacks the option to specify a seperator character. I imagine a third  
column but that will meke the definition of the filter much more  
difficult. A built-in soultion would be welcome.

Best regards,  Edzard Pasma.

     SQLite version 3.7.7.1 (APSW 3.7.7.1-r1)
     Enter ".help" for instructions
     Enter SQL statements terminated with a ";"
     sqlite> .schema group_split
     CREATE VIRTUAL TABLE group_split USING mymod();
     sqlite> create table t1 (grp, elem);
     sqlite> insert into t1 values ('g1', 'aap');
     sqlite> insert into t1 values ('g1', 'noot');
     sqlite> insert into t1 values ('g1', 'mies');
     sqlite> insert into t1 values ('g2', 'wim');
     sqlite> insert into t1 values ('g2', 'zus');
     sqlite> insert into t1 values ('g2', 'jet');
     sqlite> select grp, group_concat (elem) from t1 group by grp;
     g1|aap,noot,mies
     g2|wim,zus,jet
     sqlite> select grp, word
     from (select grp, group_concat (elem) elems from t1 group by grp)
     join group_split on line = elems;
     g1|aap
     g1|noot
     g1|mies
     g2|wim
     g2|zus
     g2|jet

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: extend the IN operator

Alexey Pechnikov-2
In reply to this post by Simon Slavin-3
2011/8/8 Simon Slavin <[hidden email]>:
> You're quite right.  If someone was going to write that functionality into a SQLite app in real life they'd probably use 'LIKE' or 'GLOB' and store all the possibilities in one row.

FTS3/FTS4 is better as index for lists. See my test script for Igor.
We can fast search any id by using "match" operator on FTS table.

P.S. FTS table has nice scalability. Check insertion a lot of records
and insertion speed is constant. I did  try 400 millions of records
(and did get database size > 100 Gb).

--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users