Snapshot database creation performance

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Snapshot database creation performance

Clinco, Michele
Hallo.

I have an application that uses a central database (Oracle or SQL
server) and creates a copy of the data on an SQLite database file.

This file contains a read only copy of more or less all the central
database tables and contains about 50 tables and may contain up to 100k
records spread between the different tables.

When application starts, it makes a request to a service that runs where
the central database is that creates the database file, creates the
tables, fills the tables with data and creates the necessary indexes.
The file is then compressed, sent to the client and deleted because it
is not used any more.

This operation is quite heavy and takes several seconds (20 sec on my
laptop, generating a 1700k data file)

Since the file is absolutely temporary, I installed a ramdisk driver and
I tried to create the file in this driver instead that in the hard disk.

The difference is really astonishing: 0.9 seconds instead of >20. This
means that I don' have to worry about performance any more.

There is only a problem: ramdisk memory sizing. The server may receive
multiple concurrent requests, so the ramdisk must be dimensioned
accordingly, wasting memory that is normally not used...
This is simple, but causes administration problems and error checking
routines that I would like to avoid.

The question is:
Is there a way that allows to create the database in memory (this is
present: use :memory: as file name) and to use the allocated memory
before it is deallocated when database is closed?

I had a look at the sources but I did not understand how memory
allocation takes place. I imagine that the library is not using a
contiguous block of memory but a sort of list....

My program is written in .Net and the compression routines I'm using are
stream based, so I need to create a memory stream from the internal
buffers that can be used by the compression routine...

Bye, Michele

Reply | Threaded
Open this post in threaded view
|

Re: Snapshot database creation performance

Andrew Piskorski
On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote:

> The question is:
> Is there a way that allows to create the database in memory (this is
> present: use :memory: as file name) and to use the allocated memory
> before it is deallocated when database is closed?

I'm confused, why do you want to do that?  Use the memory how?

If you need a lot of memory for other purposes after you're done with
the SQLite database, what's wrong with just letting SQLite free the
memory, and then malloc'ing as much memory as you want?

> My program is written in .Net and the compression routines I'm using are
> stream based, so I need to create a memory stream from the internal
> buffers that can be used by the compression routine...

Do you mean you wish to directly access SQLite's in-memory data
structures, rather than using a SQL query to get the data?  Why?

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

Re: Snapshot database creation performance

Andrew Piskorski
On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote:
> On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote:

> > My program is written in .Net and the compression routines I'm using are
> > stream based, so I need to create a memory stream from the internal
> > buffers that can be used by the compression routine...
>
> Do you mean you wish to directly access SQLite's in-memory data
> structures, rather than using a SQL query to get the data?  Why?

Oh, never mind, I wrote too soon.  I realize now that you want an
in-memory representation of a SQLite database file, which you than
gzip or whatever, all still in memory, and then transfer over the
network to whomever wants that file.  And you want to know if there's
some way to do that which offers more programatic control than the
ramdisk filesystem you're using now.

Essentially, you want a way to dynamically control the size of the RAM
disk.  But it sounds like you don't necessarily need ALL the
facilities of a normal file-system, so you're wondering if perhaps you
could implement something more purpose-specific yourself rather than
using that ramdisk driver.

Interesting, but I don't know the answer.

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

R: [sqlite] Snapshot database creation performance

Clinco, Michele
In reply to this post by Clinco, Michele
You understood perfectly.
The ramdisk is not the goal, but just a mean to obtain the 'best performance' possible goal.

I don't need the ramdisk at all, just a 'all in memory processing'.
The solution to this problem should help many users because I think we are many that use a SQLite database to store data in offline mode.

Bye, Michele
-----Messaggio originale-----
Da: Andrew Piskorski [mailto:[hidden email]]
Inviato: martedì, 7. febbraio 2006 13:54
A: [hidden email]
Oggetto: Re: [sqlite] Snapshot database creation performance

On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote:
> On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote:

> > My program is written in .Net and the compression routines I'm using are
> > stream based, so I need to create a memory stream from the internal
> > buffers that can be used by the compression routine...
>
> Do you mean you wish to directly access SQLite's in-memory data
> structures, rather than using a SQL query to get the data?  Why?

Oh, never mind, I wrote too soon.  I realize now that you want an
in-memory representation of a SQLite database file, which you than
gzip or whatever, all still in memory, and then transfer over the
network to whomever wants that file.  And you want to know if there's
some way to do that which offers more programatic control than the
ramdisk filesystem you're using now.

Essentially, you want a way to dynamically control the size of the RAM
disk.  But it sounds like you don't necessarily need ALL the
facilities of a normal file-system, so you're wondering if perhaps you
could implement something more purpose-specific yourself rather than
using that ramdisk driver.

Interesting, but I don't know the answer.

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

Re: Snapshot database creation performance

D. Richard Hipp
In reply to this post by Clinco, Michele
"Clinco, Michele" <[hidden email]> wrote:

>
> This operation is quite heavy and takes several seconds (20 sec on my
> laptop, generating a 1700k data file)
>
> Since the file is absolutely temporary, I installed a ramdisk driver and
> I tried to create the file in this driver instead that in the hard disk.
>
> The difference is really astonishing: 0.9 seconds instead of >20. This
> means that I don' have to worry about performance any more.
>

Maybe if you set "PRAGMA synchronous=OFF" the performance would
be nearly as good when going to disk as it is when using a RAM disk.

With "PRAGMA synchronous=OFF", your database can become corrupt
if you take an OS crash or power failure in the middle of an update,
but I gather you really don't care about that.
--
D. Richard Hipp   <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: R: [sqlite] Snapshot database creation performance

Teg-3
In reply to this post by Clinco, Michele
Hello Michele,

Perhaps replacing the "os_win.c" (or whatever your OS) with functions
the simulate disk IO through your compressed stream functions. The
biggest problem I see is random access. Typically there is no quick
seeking within compressed data, you just have to "Read" to the point
you want and then do this repeatedly every time you seek. Why can't
you use a plain "memory" database?

My application uses compressed data (gzip) but, the tradeoff to small
data files is exceptionally heavy CPU usage when the data is
decompressed/compressed. The datafiles are sized so, seeking isn't too
painful.

C

Tuesday, February 7, 2006, 8:09:47 AM, you wrote:

CM> You understood perfectly.
CM> The ramdisk is not the goal, but just a mean to obtain the
CM> 'best performance' possible goal.

CM> I don't need the ramdisk at all, just a 'all in memory processing'.
CM> The solution to this problem should help many users because I
CM> think we are many that use a SQLite database to store data in
CM> offline mode.

CM> Bye, Michele
CM> -----Messaggio originale-----
CM> Da: Andrew Piskorski [mailto:[hidden email]]
CM> Inviato: martedì, 7. febbraio 2006 13:54
CM> A: [hidden email]
CM> Oggetto: Re: [sqlite] Snapshot database creation performance

CM> On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote:
>> On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote:

>> > My program is written in .Net and the compression routines I'm using are
>> > stream based, so I need to create a memory stream from the internal
>> > buffers that can be used by the compression routine...
>>
>> Do you mean you wish to directly access SQLite's in-memory data
>> structures, rather than using a SQL query to get the data?  Why?

CM> Oh, never mind, I wrote too soon.  I realize now that you want an
CM> in-memory representation of a SQLite database file, which you than
CM> gzip or whatever, all still in memory, and then transfer over the
CM> network to whomever wants that file.  And you want to know if there's
CM> some way to do that which offers more programatic control than the
CM> ramdisk filesystem you're using now.

CM> Essentially, you want a way to dynamically control the size of the RAM
CM> disk.  But it sounds like you don't necessarily need ALL the
CM> facilities of a normal file-system, so you're wondering if perhaps you
CM> could implement something more purpose-specific yourself rather than
CM> using that ramdisk driver.

CM> Interesting, but I don't know the answer.




--
Best regards,
 Teg                            mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Snapshot database creation performance

Clinco, Michele
In reply to this post by Clinco, Michele

I already tried. I could see no particular effect or, at least, not so dramatic.

-----Messaggio originale-----
Da: [hidden email] [mailto:[hidden email]]
Inviato: martedì, 7. febbraio 2006 14:22
A: [hidden email]
Oggetto: Re: [sqlite] Snapshot database creation performance

"Clinco, Michele" <[hidden email]> wrote:

>
> This operation is quite heavy and takes several seconds (20 sec on my
> laptop, generating a 1700k data file)
>
> Since the file is absolutely temporary, I installed a ramdisk driver and
> I tried to create the file in this driver instead that in the hard disk.
>
> The difference is really astonishing: 0.9 seconds instead of >20. This
> means that I don' have to worry about performance any more.
>

Maybe if you set "PRAGMA synchronous=OFF" the performance would
be nearly as good when going to disk as it is when using a RAM disk.

With "PRAGMA synchronous=OFF", your database can become corrupt
if you take an OS crash or power failure in the middle of an update,
but I gather you really don't care about that.
--
D. Richard Hipp   <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Snapshot database creation performance

Clinco, Michele
In reply to this post by Clinco, Michele
Hallo!

I'm not using compressed data while SQLite is creating the database.
I create it in the standard way, and AFTER it is created and closed I compress the just generated file and I transfer it.
The compression is used only for performance reasons during the file transfer.

I cannot use a memory database because it is lost when it is closed.

Your suggestion to rewrite the file access functions is surely nice, but a bit too complex.

bye
-----Messaggio originale-----
Da: Teg [mailto:[hidden email]]
Inviato: martedì, 7. febbraio 2006 14:52
A: Clinco, Michele
Oggetto: Re: [sqlite] R: [sqlite] Snapshot database creation performance

Hello Michele,

Perhaps replacing the "os_win.c" (or whatever your OS) with functions
the simulate disk IO through your compressed stream functions. The
biggest problem I see is random access. Typically there is no quick
seeking within compressed data, you just have to "Read" to the point
you want and then do this repeatedly every time you seek. Why can't
you use a plain "memory" database?

My application uses compressed data (gzip) but, the tradeoff to small
data files is exceptionally heavy CPU usage when the data is
decompressed/compressed. The datafiles are sized so, seeking isn't too
painful.

C

Tuesday, February 7, 2006, 8:09:47 AM, you wrote:

CM> You understood perfectly.
CM> The ramdisk is not the goal, but just a mean to obtain the
CM> 'best performance' possible goal.

CM> I don't need the ramdisk at all, just a 'all in memory processing'.
CM> The solution to this problem should help many users because I
CM> think we are many that use a SQLite database to store data in
CM> offline mode.

CM> Bye, Michele
CM> -----Messaggio originale-----
CM> Da: Andrew Piskorski [mailto:[hidden email]]
CM> Inviato: martedì, 7. febbraio 2006 13:54
CM> A: [hidden email]
CM> Oggetto: Re: [sqlite] Snapshot database creation performance

CM> On Tue, Feb 07, 2006 at 07:43:44AM -0500, Andrew Piskorski wrote:
>> On Tue, Feb 07, 2006 at 12:52:08PM +0100, Clinco, Michele wrote:

>> > My program is written in .Net and the compression routines I'm using are
>> > stream based, so I need to create a memory stream from the internal
>> > buffers that can be used by the compression routine...
>>
>> Do you mean you wish to directly access SQLite's in-memory data
>> structures, rather than using a SQL query to get the data?  Why?

CM> Oh, never mind, I wrote too soon.  I realize now that you want an
CM> in-memory representation of a SQLite database file, which you than
CM> gzip or whatever, all still in memory, and then transfer over the
CM> network to whomever wants that file.  And you want to know if there's
CM> some way to do that which offers more programatic control than the
CM> ramdisk filesystem you're using now.

CM> Essentially, you want a way to dynamically control the size of the RAM
CM> disk.  But it sounds like you don't necessarily need ALL the
CM> facilities of a normal file-system, so you're wondering if perhaps you
CM> could implement something more purpose-specific yourself rather than
CM> using that ramdisk driver.

CM> Interesting, but I don't know the answer.




--
Best regards,
 Teg                            mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: R: [sqlite] Snapshot database creation performance

Andrew Piskorski
In reply to this post by Teg-3
On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

> My application uses compressed data (gzip) but, the tradeoff to small
> data files is exceptionally heavy CPU usage when the data is
> decompressed/compressed.

Incidentally, the MonetDB folks have done research on that sort of
thing.  In their most recent project, "X100", they keep the data
compressed both on disk AND in main memory, and decompress it only in
the CPU cache when actually manipulating values.

They do that not primarily to save disk space, but to reduce the
amount of memory bandwith needed.  Apparently in some cases it's a big
speed-up, and shifts the query from being memory I/O bound to CPU
bound.  Of course, in order for that to work they have to use very
lightweight compression/decompression algorithms.  Gzip gives much
better compression, but in comparison it's extremely slow.

Probably not immediately useful, but it seems like interesting stuff:

  http://monetdb.cwi.nl/
  http://homepages.cwi.nl/~mk/MonetDB/
  http://sourceforge.net/projects/monetdb/
  http://homepages.cwi.nl/~boncz/

  "MonetDB/X100 - A DBMS In The CPU Cache"
  by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
  ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

Btw, apparently the current stable version of MonetDB is open source
but they haven't decided whether the X100 work will be or not.

Googling just now, there seems to have been a fair amount of research
and commercialization of this sort of stuff lately, e.g.:

  http://db.csail.mit.edu/projects/cstore/

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

Re[2]: R: [sqlite] Snapshot database creation performance

Teg-3
Hello Andrew,

My purpose is primarily disk storage savings, the data's mainly text
so it's highly compressible. 500K on disk chunks of data decompress
out to about 8 megabytes of text. What compression scheme do they use?
I might consider trading some disk space for faster
compression/decompression.

C

Tuesday, February 7, 2006, 10:26:02 AM, you wrote:

AP> On Tue, Feb 07, 2006 at 08:51:43AM -0500, Teg wrote:

>> My application uses compressed data (gzip) but, the tradeoff to small
>> data files is exceptionally heavy CPU usage when the data is
>> decompressed/compressed.

AP> Incidentally, the MonetDB folks have done research on that sort of
AP> thing.  In their most recent project, "X100", they keep the data
AP> compressed both on disk AND in main memory, and decompress it only in
AP> the CPU cache when actually manipulating values.

AP> They do that not primarily to save disk space, but to reduce the
AP> amount of memory bandwith needed.  Apparently in some cases it's a big
AP> speed-up, and shifts the query from being memory I/O bound to CPU
AP> bound.  Of course, in order for that to work they have to use very
AP> lightweight compression/decompression algorithms.  Gzip gives much
AP> better compression, but in comparison it's extremely slow.

AP> Probably not immediately useful, but it seems like interesting stuff:

AP>   http://monetdb.cwi.nl/
AP>   http://homepages.cwi.nl/~mk/MonetDB/
AP>   http://sourceforge.net/projects/monetdb/
AP>   http://homepages.cwi.nl/~boncz/

AP>   "MonetDB/X100 - A DBMS In The CPU Cache"
AP>   by Marcin Zukowski, Peter Boncz, Niels Nes, Sandor Himan
AP>   ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

AP> Btw, apparently the current stable version of MonetDB is open source
AP> but they haven't decided whether the X100 work will be or not.

AP> Googling just now, there seems to have been a fair amount of research
AP> and commercialization of this sort of stuff lately, e.g.:

AP>   http://db.csail.mit.edu/projects/cstore/




--
Best regards,
 Teg                            mailto:[hidden email]