SQLite as R data store

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

SQLite as R data store

Andrew Piskorski
As an R user, I'm happy to see the project below, it could become a
very handy use of SQLite.  

I suspect it's well out of scope for that project, but it would be
particularly cool to eventually see some of the integration go the
other way, and allow use of SQLite's SQL engine to manipulate R data
frames (which are tables, basically) from within R, as that would be
much more powerful than R's rather limited set of data frame
manipulation functions.

----- Forwarded message from Douglas Bates <[hidden email]> -----

From: "Douglas Bates" <[hidden email]>
Date: Thu, 25 May 2006 11:16:12 -0500
To: [hidden email]
Subject: [Rd] R Project accepted for the Google Summer of Code

We are very pleased that an R project has been selected as one of the GNU
projects for the Google Summer of Code 2006 (http://code.google.com/soc/).

Miguel Angel R. Manese, an M.S. Statistics student at the University of
the Philippines, will be working with Douglas Bates and Brian Ripley on a
project entitled `SQLite Data Frames for R' to let R store very large
datasets in an SQLite database transparently, implementing primitive
operations for data stored in SQLite so that they behave exactly like
ordinary data frames to the users.  It is likely that the project will
result in the first instance in an R package (which may in due course
become part of the tarball).

Congratulations, Miguel!

----- End forwarded message -----

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

Re: SQLite as R data store

Eric Bohlman
Andrew Piskorski wrote:
> As an R user, I'm happy to see the project below, it could become a
> very handy use of SQLite.  
>
> I suspect it's well out of scope for that project, but it would be
> particularly cool to eventually see some of the integration go the
> other way, and allow use of SQLite's SQL engine to manipulate R data
> frames (which are tables, basically) from within R, as that would be
> much more powerful than R's rather limited set of data frame
> manipulation functions.

You might want to take a look at DRH's proposal,
http://www.sqlite.org/cvstrac/wiki?p=VirtualTables
which looks like it would make it easier for the R community to
implement the sort of interface you're talking about.

Reply | Threaded
Open this post in threaded view
|

Re: SQLite as R data store and spatial data maybe ?

Noel Frankinet
Eric Bohlman wrote:

> Andrew Piskorski wrote:
>
>> As an R user, I'm happy to see the project below, it could become a
>> very handy use of SQLite.
>> I suspect it's well out of scope for that project, but it would be
>> particularly cool to eventually see some of the integration go the
>> other way, and allow use of SQLite's SQL engine to manipulate R data
>> frames (which are tables, basically) from within R, as that would be
>> much more powerful than R's rather limited set of data frame
>> manipulation functions.
>
>
> You might want to take a look at DRH's proposal,
> http://www.sqlite.org/cvstrac/wiki?p=VirtualTables
> which looks like it would make it easier for the R community to
> implement the sort of interface you're talking about.
>
>
>
Hello all,

I've read the proposed virtualTable schema.
Mr Hipp could you elaborate on how you envision spatial indexing ?
I now construct a in memory quadtree of my spatial structure, so that I
get a set of key when I do a "window query".
I suppose now that I need to create a temporary table of thoses key and
then do a kind of union between this table and my target
geometry+attributes table. How would  a virtual table help that process ?

Thank you

--
Noël Frankinet
Gistek Software SA
http://www.gistek.net

Reply | Threaded
Open this post in threaded view
|

Re: SQLite as R data store and spatial data maybe ?

D. Richard Hipp
Noel Frankinet <[hidden email]> wrote:
> >
> > You might want to take a look at DRH's proposal,
> > http://www.sqlite.org/cvstrac/wiki?p=VirtualTables
> > which looks like it would make it easier for the R community to
> > implement the sort of interface you're talking about.
>
> I've read the proposed virtualTable schema.
> Mr Hipp could you elaborate on how you envision spatial indexing ?

I've not thought about it.  The VirtualTable proposal was
developped to support full-text search and as something to
replace the (goofy) pragmas currently used to find schema
information.  I would imagine that virtualtables would also
be useful for spatial indexing, but I have not looked closely
into the matter.
--
D. Richard Hipp   <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: SQLite as R data store and spatial data maybe ?

Noel Frankinet
[hidden email] wrote:

>Noel Frankinet <[hidden email]> wrote:
>  
>
>>>You might want to take a look at DRH's proposal,
>>>http://www.sqlite.org/cvstrac/wiki?p=VirtualTables
>>>which looks like it would make it easier for the R community to
>>>implement the sort of interface you're talking about.
>>>      
>>>
>>I've read the proposed virtualTable schema.
>>Mr Hipp could you elaborate on how you envision spatial indexing ?
>>    
>>
>
>I've not thought about it.  The VirtualTable proposal was
>developped to support full-text search and as something to
>replace the (goofy) pragmas currently used to find schema
>information.  I would imagine that virtualtables would also
>be useful for spatial indexing, but I have not looked closely
>into the matter.
>--
>D. Richard Hipp   <[hidden email]>
>
>
>
>  
>
Thank you for your reply,
I suppose that whatever indexing technique, it boils down to " I have
that bunch of keys please return the corresponding records".
I can then use IN or BETWEEN statement, but I suspect that a UNION
between a TABLE(query_quadtree(interesting_window)) (is this a virtual
table?) and my target table would do better.
Am I right ? maybe a little bit confused ?

Best regards




--
Noël Frankinet
Gistek Software SA
http://www.gistek.net

Reply | Threaded
Open this post in threaded view
|

RE: SQLite as R data store and spatial data maybe ?

Eric Franks
How do I unsubscribe?  I don't want to keep getting these emails.



-----Original Message-----
From: Noel Frankinet [mailto:[hidden email]]
Sent: Monday, May 29, 2006 9:23 AM
To: [hidden email]
Subject: Re: [sqlite] SQLite as R data store and spatial data maybe ?

[hidden email] wrote:

>Noel Frankinet <[hidden email]> wrote:
>  
>
>>>You might want to take a look at DRH's proposal,
>>>http://www.sqlite.org/cvstrac/wiki?p=VirtualTables
>>>which looks like it would make it easier for the R community to
>>>implement the sort of interface you're talking about.
>>>      
>>>
>>I've read the proposed virtualTable schema.
>>Mr Hipp could you elaborate on how you envision spatial indexing ?
>>    
>>
>
>I've not thought about it.  The VirtualTable proposal was
>developped to support full-text search and as something to
>replace the (goofy) pragmas currently used to find schema
>information.  I would imagine that virtualtables would also
>be useful for spatial indexing, but I have not looked closely
>into the matter.
>--
>D. Richard Hipp   <[hidden email]>
>
>
>
>  
>
Thank you for your reply,
I suppose that whatever indexing technique, it boils down to " I have
that bunch of keys please return the corresponding records".
I can then use IN or BETWEEN statement, but I suspect that a UNION
between a TABLE(query_quadtree(interesting_window)) (is this a virtual
table?) and my target table would do better.
Am I right ? maybe a little bit confused ?

Best regards




--
Noël Frankinet
Gistek Software SA
http://www.gistek.net

Reply | Threaded
Open this post in threaded view
|

external indexing

Noel Frankinet
In reply to this post by Noel Frankinet
Hello All,

since nobody has answered my last question, I'll try to rephrase it :
How do you select record you have the keys (and I mean a lot of keys)
I know
SELECT .. WHERE rowid=
SELECT .. WHERE rowid IN (something like that)
SELECT ... WHARE rowid BETWEEN ...

Is there another way, does a temporary table helps ?

Thank you

--
Noël Frankinet
Gistek Software SA
http://www.gistek.net

Reply | Threaded
Open this post in threaded view
|

Re: external indexing

Rusty Conover

On May 30, 2006, at 1:36 AM, Noel Frankinet wrote:

> Hello All,
>
> since nobody has answered my last question, I'll try to rephrase it :
> How do you select record you have the keys (and I mean a lot of keys)
> I know
> SELECT .. WHERE rowid=
> SELECT .. WHERE rowid IN (something like that)
> SELECT ... WHARE rowid BETWEEN ...
>
> Is there another way, does a temporary table helps ?

Using a temporary table seems to be the best way when you have lots  
of keys.  Of course there is a tradeoff point, such as for 1 or 2  
rows a temporary table is overkill.  Also it depends on if you're  
temporary table is being stored in memory or on the disk, check the  
relevant PRAGMA documentation to specify how you want them stored.

CREATE TEMP TABLE search_keys(oid INTEGER NOT NULL PRIMARY KEY);

Next, insert all of the records you're searching for into search_keys  
table.

You might want to do an "ANALYZE" here on the temporary table, so the  
planner can pick the most efficient way to do the upcoming join  
depending on the number of rows you're searching for, and the number  
of rows you're searching in.

Do the actual select:

SELECT ... from search_keys, target where search_keys.oid = bar.oid

I believe the ordering of the tables in the from clause may be  
important as the planner may use it in determining which table to  
process first, but if there are statistics about all of the tables it  
may reorder them.  drh, would know actual behavior for sure.

Finally don't forget to:

DROP TABLE search_keys

Cheers,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com



Reply | Threaded
Open this post in threaded view
|

Re: external indexing

Noel Frankinet
Rusty Conover wrote:

>
> Using a temporary table seems to be the best way when you have lots  
> of keys.  Of course there is a tradeoff point, such as for 1 or 2  
> rows a temporary table is overkill.  Also it depends on if you're  
> temporary table is being stored in memory or on the disk, check the  
> relevant PRAGMA documentation to specify how you want them stored.
>
> CREATE TEMP TABLE search_keys(oid INTEGER NOT NULL PRIMARY KEY);
>
> Next, insert all of the records you're searching for into search_keys  
> table.
>
> You might want to do an "ANALYZE" here on the temporary table, so the  
> planner can pick the most efficient way to do the upcoming join  
> depending on the number of rows you're searching for, and the number  
> of rows you're searching in.
>
> Do the actual select:
>
> SELECT ... from search_keys, target where search_keys.oid = bar.oid
>
> I believe the ordering of the tables in the from clause may be  
> important as the planner may use it in determining which table to  
> process first, but if there are statistics about all of the tables it  
> may reorder them.  drh, would know actual behavior for sure.
>
> Finally don't forget to:
>
> DROP TABLE search_keys
>
> Cheers,
>
> Rusty
> --
> Rusty Conover
> InfoGears Inc.
> Web: http://www.infogears.com
>
>
Hello Rusty,
Exactly what I was looking for
I've seen the PRAGMA TEMP_STORE to create a memory table.
Do you think I can join a memory table and a disk based one ?

Anyway, thank a lot
Best wishes


--
Noël Frankinet
Gistek Software SA
http://www.gistek.net

Reply | Threaded
Open this post in threaded view
|

Re: external indexing

Rusty Conover

On May 30, 2006, at 3:05 AM, Noel Frankinet wrote:

> Hello Rusty,
> Exactly what I was looking for
> I've seen the PRAGMA TEMP_STORE to create a memory table.
> Do you think I can join a memory table and a disk based one ?
>
> Anyway, thank a lot
> Best wishes


Of course you can do that.

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com