Effect of blobs on performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Effect of blobs on performance

Brett Keating-2
Hi,
 
I'm curious about what the effect of having a blob in the database may
be on performance. I have two design options: 1) put a small image file
(15-30kbyte) into the database as a blob, and 2) store the image in a
separate file on disk and hold the filename in the database. My table
has around 20 rows in it, about half are strings/smaller blobs and half
are integers.
 
Option number one, for various reasons, is far more elegant and simple
in terms of its impact on the rest of the code. However, I am concerned
that holding such large amounts of data per record might impact
performance. I could be worried about nothing though, which is why I'm
writing to this list :).
 
Any advice or input?
 
Thanks,
Brett
Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

Mr. Puneet Kishor
On 2/21/07, Brett Keating <[hidden email]> wrote:

> Hi,
>
> I'm curious about what the effect of having a blob in the database may
> be on performance. I have two design options: 1) put a small image file
> (15-30kbyte) into the database as a blob, and 2) store the image in a
> separate file on disk and hold the filename in the database. My table
> has around 20 rows in it, about half are strings/smaller blobs and half
> are integers.
>
> Option number one, for various reasons, is far more elegant and simple
> in terms of its impact on the rest of the code. However, I am concerned
> that holding such large amounts of data per record might impact
> performance. I could be worried about nothing though, which is why I'm
> writing to this list :).
>


unless you haven't told us the complete story, you are worried about
nothing probably. You have only 20 rows, fer crying out loud, you
could probably build an application with chalk and slate and it would
be fast enough ;-)

that said, there is much recurring discussion on this on another list
that I am on (folks wanting to store remote sensing imagery into
PostGres). Most concur that it is a ridiculous idea... just store the
metadata including the image path in the db, and then let the
filesystem do the heavy lifting. Usually one can't do much db level
analysis on blobs anyway, so storing them in the db becomes pointless
other than the fact that it is a nice neat single bundle. Well, wonder
if you could utilize the concept of packages the way they are on OS X.
From the outside, a package looks like a file... you can grab it, copy
it, move it. Actually it is a folder in which you can have your core
db as well as all the images.

--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
---------------------------------------------------------------------
collaborate, communicate, compete
=====================================================================

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

D. Richard Hipp
In reply to this post by Brett Keating-2
"Brett Keating" <[hidden email]> wrote:

> Hi,
>  
> I'm curious about what the effect of having a blob in the database may
> be on performance. I have two design options: 1) put a small image file
> (15-30kbyte) into the database as a blob, and 2) store the image in a
> separate file on disk and hold the filename in the database. My table
> has around 20 rows in it, about half are strings/smaller blobs and half
> are integers.
>  
> Option number one, for various reasons, is far more elegant and simple
> in terms of its impact on the rest of the code. However, I am concerned
> that holding such large amounts of data per record might impact
> performance. I could be worried about nothing though, which is why I'm
> writing to this list :).
>  

When I was designing the SQLite file format, I made the assumption
that BLOBs would be used infrequently and would not be very big.
The file format design is not optimized for storing BLOBs.  Indeed,
BLOBs are stored as a singly-linked list of database pages.  It
is hard to imagine a more inefficient design.

Much to my surprise, people begin putting multi-megabyte BLOBs
in SQLite databases and reporting that performance really was not
an issue.  I have lately taken up this practice myself and routinely
uses SQLite database with BLOBs that are over 10MiB is size.  And
it all seems to work pretty well here on my Linux workstation.  I
have no explanation for why it works so well, but it does so I'm not
going to complain.

If your images are only 30KiB, you should have no problems.

Here's a hint though - make the BLOB columns the last column in
your tables.  Or even store the BLOBs in a separate table which
only has two columns: an integer primary key and the blob itself,
and then access the BLOB content using a join if you need to.
If you put various small integer fields after the BLOB, then
SQLite has to scan through the entire BLOB content (following
the linked list of disk pages) to get to the integer fields at
the end, and that definitely can slow you down.

--
D. Richard Hipp  <[hidden email]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

Eric Scouten
In reply to this post by Brett Keating-2
My suggestion: Do a quick experiment. I had a similar question a year  
or so ago. I wrote some code a year or so that generated random blobs  
of varying sizes and tossed them into a SQLite DB and onto files on  
the file system (Mac OS).

There are some complicating variables, such as our application  
framework and some extra resource marshaling that might affect  
performance one way or another, but our results were that for blobs  
averaging <20-30K, SQLite was faster; for larger data sets, the file  
system was faster.

One thing to consider is how the presence of said blobs might impact  
performance of the database in general.

-Eric



On 21 Feb 2007, at 13:26, Brett Keating wrote:

> Hi,
>
> I'm curious about what the effect of having a blob in the database may
> be on performance. I have two design options: 1) put a small image  
> file
> (15-30kbyte) into the database as a blob, and 2) store the image in a
> separate file on disk and hold the filename in the database. My table
> has around 20 rows in it, about half are strings/smaller blobs and  
> half
> are integers.
>
> Option number one, for various reasons, is far more elegant and simple
> in terms of its impact on the rest of the code. However, I am  
> concerned
> that holding such large amounts of data per record might impact
> performance. I could be worried about nothing though, which is why I'm
> writing to this list :).
>
> Any advice or input?
>
> Thanks,
> Brett


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: Effect of blobs on performance

Brett Keating-2
In reply to this post by Mr. Puneet Kishor
Thanks for the quick reply.

I was lysdexic, I meant to say 20 columns. Probably would never exceed
20,000 rows, most likely would hover around 2-4K rows in a typical
situation.

If it has no effect on performance, I'd rather hold it in the database
because I do like the idea of having a "neat package" so to speak. For
instance when a record is deleted, I'd rather not worry about having to
check to see if it holds an image file and then go delete that file...
And if I delete many items at once with one statement, I'd have to break
it up into individual deletes and check each one in order to delete my
corresponding image file on disk... Otherwise I'll get these orphaned
image files all over the place on my disk that correspond to deleted
records.

I'm fairly sure disk space requirements will be nearly identical in each
case... Just worried about query/update performance.

Thanks,
Brett

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of P
Kishor
Sent: Wednesday, February 21, 2007 1:38 PM
To: [hidden email]
Subject: Re: [sqlite] Effect of blobs on performance

On 2/21/07, Brett Keating <[hidden email]> wrote:
> Hi,
>
> I'm curious about what the effect of having a blob in the database may

> be on performance. I have two design options: 1) put a small image
> file
> (15-30kbyte) into the database as a blob, and 2) store the image in a
> separate file on disk and hold the filename in the database. My table
> has around 20 rows in it, about half are strings/smaller blobs and
> half are integers.
>
> Option number one, for various reasons, is far more elegant and simple

> in terms of its impact on the rest of the code. However, I am
> concerned that holding such large amounts of data per record might
> impact performance. I could be worried about nothing though, which is
> why I'm writing to this list :).
>


unless you haven't told us the complete story, you are worried about
nothing probably. You have only 20 rows, fer crying out loud, you could
probably build an application with chalk and slate and it would be fast
enough ;-)

that said, there is much recurring discussion on this on another list
that I am on (folks wanting to store remote sensing imagery into
PostGres). Most concur that it is a ridiculous idea... just store the
metadata including the image path in the db, and then let the filesystem
do the heavy lifting. Usually one can't do much db level analysis on
blobs anyway, so storing them in the db becomes pointless other than the
fact that it is a nice neat single bundle. Well, wonder if you could
utilize the concept of packages the way they are on OS X.
From the outside, a package looks like a file... you can grab it, copy
it, move it. Actually it is a folder in which you can have your core db
as well as all the images.

--
Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies,
UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation
http://www.osgeo.org/education/
---------------------------------------------------------------------
collaborate, communicate, compete
=====================================================================

------------------------------------------------------------------------
-----
To unsubscribe, send email to [hidden email]
------------------------------------------------------------------------
-----




-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: Effect of blobs on performance

Brett Keating-2
In reply to this post by D. Richard Hipp
Thanks I think this answers my question well!

Brett

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Wednesday, February 21, 2007 1:41 PM
To: [hidden email]
Subject: Re: [sqlite] Effect of blobs on performance

"Brett Keating" <[hidden email]> wrote:
> Hi,
>  
> I'm curious about what the effect of having a blob in the database may

> be on performance. I have two design options: 1) put a small image
> file
> (15-30kbyte) into the database as a blob, and 2) store the image in a
> separate file on disk and hold the filename in the database. My table
> has around 20 rows in it, about half are strings/smaller blobs and
> half are integers.
>  
> Option number one, for various reasons, is far more elegant and simple

> in terms of its impact on the rest of the code. However, I am
> concerned that holding such large amounts of data per record might
> impact performance. I could be worried about nothing though, which is
> why I'm writing to this list :).
>  

When I was designing the SQLite file format, I made the assumption that
BLOBs would be used infrequently and would not be very big.
The file format design is not optimized for storing BLOBs.  Indeed,
BLOBs are stored as a singly-linked list of database pages.  It is hard
to imagine a more inefficient design.

Much to my surprise, people begin putting multi-megabyte BLOBs in SQLite
databases and reporting that performance really was not an issue.  I
have lately taken up this practice myself and routinely uses SQLite
database with BLOBs that are over 10MiB is size.  And it all seems to
work pretty well here on my Linux workstation.  I have no explanation
for why it works so well, but it does so I'm not going to complain.

If your images are only 30KiB, you should have no problems.

Here's a hint though - make the BLOB columns the last column in your
tables.  Or even store the BLOBs in a separate table which only has two
columns: an integer primary key and the blob itself, and then access the
BLOB content using a join if you need to.
If you put various small integer fields after the BLOB, then SQLite has
to scan through the entire BLOB content (following the linked list of
disk pages) to get to the integer fields at the end, and that definitely
can slow you down.

--
D. Richard Hipp  <[hidden email]>


------------------------------------------------------------------------
-----
To unsubscribe, send email to [hidden email]
------------------------------------------------------------------------
-----




-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

Thomas Fjellstrom
In reply to this post by D. Richard Hipp
On February 21, 2007, [hidden email] wrote:

> "Brett Keating" <[hidden email]> wrote:
> > Hi,
> >
> > I'm curious about what the effect of having a blob in the database may
> > be on performance. I have two design options: 1) put a small image file
> > (15-30kbyte) into the database as a blob, and 2) store the image in a
> > separate file on disk and hold the filename in the database. My table
> > has around 20 rows in it, about half are strings/smaller blobs and half
> > are integers.
> >
> > Option number one, for various reasons, is far more elegant and simple
> > in terms of its impact on the rest of the code. However, I am concerned
> > that holding such large amounts of data per record might impact
> > performance. I could be worried about nothing though, which is why I'm
> > writing to this list :).
>
> When I was designing the SQLite file format, I made the assumption
> that BLOBs would be used infrequently and would not be very big.
> The file format design is not optimized for storing BLOBs.  Indeed,
> BLOBs are stored as a singly-linked list of database pages.  It
> is hard to imagine a more inefficient design.
>
> Much to my surprise, people begin putting multi-megabyte BLOBs
> in SQLite databases and reporting that performance really was not
> an issue.  I have lately taken up this practice myself and routinely
> uses SQLite database with BLOBs that are over 10MiB is size.  And
> it all seems to work pretty well here on my Linux workstation.  I
> have no explanation for why it works so well, but it does so I'm not
> going to complain.
>
> If your images are only 30KiB, you should have no problems.
>
> Here's a hint though - make the BLOB columns the last column in
> your tables.  Or even store the BLOBs in a separate table which
> only has two columns: an integer primary key and the blob itself,
> and then access the BLOB content using a join if you need to.
> If you put various small integer fields after the BLOB, then
> SQLite has to scan through the entire BLOB content (following
> the linked list of disk pages) to get to the integer fields at
> the end, and that definitely can slow you down.

I still wonder about the utility of storing binary data in the db itself.
Maybe it makes it more easy to distribute that way, but how often does one
distribute an entire database in a "vendor specific" format?

I'm quite interested in hearing people's reasoning for going the blob route,
when you have a perfectly good "database" format for "blobs" already (various
filesystems).

> --
> D. Richard Hipp  <[hidden email]>
>
>
> ---------------------------------------------------------------------------
>-- To unsubscribe, send email to [hidden email]
> ---------------------------------------------------------------------------
>--



--
Thomas Fjellstrom
[hidden email]

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

Dan Kennedy-4

> I'm quite interested in hearing people's reasoning for going the blob route,
> when you have a perfectly good "database" format for "blobs" already (various
> filesystems).



Three technical reasons for me personally:

* Can include blob operations as part of atomic transactions.

* In SQLite, my blobs won't be corrupted if the machine loses power
  the way they (probably) will be if I write my own code to access
  the file-system.

* Don't have to invent my own scheme for mapping from database entry
  to blob file, nor my own cross platform

And the claim that it's less work to program that way.

Better question IMO: What are peoples reasons for calling open()
when you could just as easily call sqlite3_open()? :)



> > --
> > D. Richard Hipp  <[hidden email]>
> >
> >
> > ---------------------------------------------------------------------------
> >-- To unsubscribe, send email to [hidden email]
> > ---------------------------------------------------------------------------
> >--
>
>
>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re[2]: Effect of blobs on performance

Ion Silvestru
In reply to this post by Brett Keating-2

>>I'm fairly sure disk space requirements will be nearly identical in
>>each case...

In case of blobs in SQLite there will be less disk space used than in
case of file system (cluster size etc.)


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re[2]: Effect of blobs on performance

Ion Silvestru
In reply to this post by Dan Kennedy-4

>* In SQLite, my blobs won't be corrupted if the machine loses power
>  the way they (probably) will be if I write my own code to access
>  the file-system.

But, in case of a corruption, you will have entire blob DB corrupted versus
at least one file (aka one row in DB) corrupted.


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

D. Richard Hipp
Ion Silvestru <[hidden email]> wrote:
> >* In SQLite, my blobs won't be corrupted if the machine loses power
> >  the way they (probably) will be if I write my own code to access
> >  the file-system.
>
> But, in case of a corruption, you will have entire blob DB corrupted versus
> at least one file (aka one row in DB) corrupted.
>

Not.  SQLite databases do not corrupt when you turn the power off.
When power is restored and you attempt to access the database again,
the transaction that you were in the middle of at the point of the
power failure automatically rolls back, restoring the database to
a sane state.  This is one of the key reasons why you would want
to use SQLite instead of fopen() for storing miscellaneous data.

--
D. Richard Hipp  <[hidden email]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

D. Richard Hipp
In reply to this post by Thomas Fjellstrom
Thomas Fjellstrom <[hidden email]> wrote:
>
> I still wonder about the utility of storing binary data in the db itself.
> Maybe it makes it more easy to distribute that way, but how often does one
> distribute an entire database in a "vendor specific" format?

A common use for SQLite is as an application file format.  When
you do File->Open, instead of reading and parsing a bunch of
information in a proprietary format, just call sqlite3_open()
on the file instead.  File->Save becomes a no-op.  There is no
parser to write.  Performance problems can often be fixed simply
by adding another index.  All of your data is written to disk
atomically and is relatively safe from corruption even if you
turn off the power during a write.  There are a lot of advantages
to this approach.

Lots of people are starting to use SQLite as an application
file format.  Remember the SQLite slogan:

    SQLite is not a replacement for Oracle, it is a
    replacement for fopen().

>
> I'm quite interested in hearing people's reasoning for going the blob route,
> when you have a perfectly good "database" format for "blobs" already (various
> filesystems).
>

Just yesterday, Eric Scouten posted on this list that he had done
a study and found that for BLOB smaller than 20-30K it was faster
to store them in an SQLite database than on disk.
--
D. Richard Hipp  <[hidden email]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re[2]: Effect of blobs on performance

Teg-3
Hello drh,

Thursday, February 22, 2007, 7:54:58 AM, you wrote:


dhc> A common use for SQLite is as an application file format.  When
you do File->>Open, instead of reading and parsing a bunch of
dhc> information in a proprietary format, just call sqlite3_open()
dhc> on the file instead.  File->Save becomes a no-op.  There is no
dhc> parser to write.  Performance problems can often be fixed simply
dhc> by adding another index.  All of your data is written to disk
dhc> atomically and is relatively safe from corruption even if you
dhc> turn off the power during a write.  There are a lot of advantages
dhc> to this approach.

dhc> Lots of people are starting to use SQLite as an application
dhc> file format.  Remember the SQLite slogan:

dhc>     SQLite is not a replacement for Oracle, it is a
dhc>     replacement for fopen().

>>
>> I'm quite interested in hearing people's reasoning for going the blob route,
>> when you have a perfectly good "database" format for "blobs" already (various
>> filesystems).
>>

dhc> Just yesterday, Eric Scouten posted on this list that he had done
dhc> a study and found that for BLOB smaller than 20-30K it was faster
dhc> to store them in an SQLite database than on disk.
dhc> --
dhc> D. Richard Hipp  <[hidden email]>

I was reading a similar study that stated that 150K was the knee point
in general for DB blob performance. I store 1000's of JPG images in
SQLite as blobs, ranging in size up to 4 megs or so each (though the
typical size is in the 200-300K range). My only issue with this method
is that enumeration seems slower than enumerating the same files in a
folder. I store them in DB's because they're nice units of data I can
backup and move around anywhere without having worry about losing the
connection between the DB and the files themselves. Because they're
images, I only ever want to read them all at once so, not having
random access within the blobs is unimportant to me.



-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

Dennis Jenkins
In reply to this post by Thomas Fjellstrom
Thomas Fjellstrom wrote:
> I still wonder about the utility of storing binary data in the db itself.
> Maybe it makes it more easy to distribute that way, but how often does one
> distribute an entire database in a "vendor specific" format?
>
> I'm quite interested in hearing people's reasoning for going the blob route,
> when you have a perfectly good "database" format for "blobs" already (various
> filesystems).
>  

1)
We use the sqlite encryption extension.  We want our blobs encrypted as
well as our database, so putting the blobs into the database makes
sense.  We have a special table for the blobs that has a primary key (3
columns) and the blob.  The rest of the data is contained in other tables.

2)
We don't need to worry about atomically deleting disk blobs and database
rows.  We take advantage of the ACID nature of sqlite.  This way we
don't have to code for contingencies where the user has managed to
delete or corrupt a blob, or a blob that our app can't delete even when
it deletes the database row.

3)
Having everything in one package.  Makes tech support much easier if the
user only has to transmit a single file instead of an entire directory.

4)
We modify the blobs at runtime.  ACIDness of sqlite is very nice here.  
I don't want to try to re-implement this directly on the filesystem
(even if it becomes a simple rename operation).


Our blobs vary in size from 12K to 3M.  Sqlite is not a performance
bottleneck for us... the client's internet connection is.

I have not done extensive performance tests on these settings, but these
are the settings that our app uses when it creates/opens the sqlite
database:

    db.ExecuteImmediate("PRAGMA page_size=4096");
    db.ExecuteImmediate("PRAGMA legacy_file_format=ON");
    db.ExecuteImmediate("PRAGMA cache_size=8000");
    db.ExecuteImmediate("PRAGMA synchronous=OFF");
    db.ExecuteImmediate("PRAGMA temp_store=2");


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

D. Richard Hipp
Dennis Jenkins <[hidden email]> wrote:
>
> these are the settings that our app uses when it creates/opens the sqlite
> database:
>
>     db.ExecuteImmediate("PRAGMA synchronous=OFF");

With synchronous=OFF, a power failure might result in database
corruption.  Is this an issue for you?
--
D. Richard Hipp  <[hidden email]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

Denis Sbragion
In reply to this post by Dan Kennedy-4
Hello Dan,

On Thu, February 22, 2007 06:08, Dan Kennedy wrote:
> * Can include blob operations as part of atomic transactions.

me too. Transactions are a major advantage of database blobs.

Bye,

--
        Denis Sbragion
        InfoTecna
        Tel: +39 0362 805396, Fax: +39 0362 805404
        URL: http://www.infotecna.it






-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re[2]: Effect of blobs on performance

Teg-3
In reply to this post by D. Richard Hipp
Hello drh,

Thursday, February 22, 2007, 9:08:08 AM, you wrote:

dhc> Dennis Jenkins <[hidden email]> wrote:
>>
>> these are the settings that our app uses when it creates/opens the sqlite
>> database:
>>
>>     db.ExecuteImmediate("PRAGMA synchronous=OFF");

dhc> With synchronous=OFF, a power failure might result in database
dhc> corruption.  Is this an issue for you?
dhc> --
dhc> D. Richard Hipp  <[hidden email]>


dhc> -----------------------------------------------------------------------------
dhc> To unsubscribe, send email to [hidden email]
dhc> -----------------------------------------------------------------------------

I'm pretty sure an application crash even without power failure can
corrupt to. At least in my experience. I keep synchronous on and
simply use "insert or ignore" syntax within a transaction to get
performance.



--
Best regards,
 Teg                            mailto:[hidden email]


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

D. Richard Hipp
Teg <[hidden email]> wrote:
>
> I'm pretty sure an application crash even without power failure can
> corrupt to. At least in my experience. I keep synchronous on and
> simply use "insert or ignore" syntax within a transaction to get
> performance.
>

It is not suppose to.  There are extensive tests in the test
suite where we simulate application crashes and verify that the
database is not corrupted.  If you encounter a situation where
the database is corrupted, that is a bug and you should report
it.

--
D. Richard Hipp  <[hidden email]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

Dennis Jenkins
In reply to this post by D. Richard Hipp
[hidden email] wrote:

> Dennis Jenkins <[hidden email]> wrote:
>  
>> these are the settings that our app uses when it creates/opens the sqlite
>> database:
>>
>>     db.ExecuteImmediate("PRAGMA synchronous=OFF");
>>    
>
> With synchronous=OFF, a power failure might result in database
> corruption.  Is this an issue for you?
> --
> D. Richard Hipp  <[hidden email]>
>  

Not really.  The data can be regenerated by the user without too much
difficulty.  I don't have the numbers handy, but I seem to remember that
I did some performance experiments and determined that the performance
gains significantly outweighed the potential problems.  To my knowledge,
for the past two years, only one user (out of many hundreds) has ever
gotten a corrupt database.

I suppose I'll put in a ticket into our issue tracking system to review
this decision.  According to a comment in our source code, I based this
action on
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html#pragma-synchronous



-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Effect of blobs on performance

John Stanton-3
In reply to this post by Thomas Fjellstrom
Thomas Fjellstrom wrote:

> On February 21, 2007, [hidden email] wrote:
>
>>"Brett Keating" <[hidden email]> wrote:
>>
>>>Hi,
>>>
>>>I'm curious about what the effect of having a blob in the database may
>>>be on performance. I have two design options: 1) put a small image file
>>>(15-30kbyte) into the database as a blob, and 2) store the image in a
>>>separate file on disk and hold the filename in the database. My table
>>>has around 20 rows in it, about half are strings/smaller blobs and half
>>>are integers.
>>>
>>>Option number one, for various reasons, is far more elegant and simple
>>>in terms of its impact on the rest of the code. However, I am concerned
>>>that holding such large amounts of data per record might impact
>>>performance. I could be worried about nothing though, which is why I'm
>>>writing to this list :).
>>
>>When I was designing the SQLite file format, I made the assumption
>>that BLOBs would be used infrequently and would not be very big.
>>The file format design is not optimized for storing BLOBs.  Indeed,
>>BLOBs are stored as a singly-linked list of database pages.  It
>>is hard to imagine a more inefficient design.
>>
>>Much to my surprise, people begin putting multi-megabyte BLOBs
>>in SQLite databases and reporting that performance really was not
>>an issue.  I have lately taken up this practice myself and routinely
>>uses SQLite database with BLOBs that are over 10MiB is size.  And
>>it all seems to work pretty well here on my Linux workstation.  I
>>have no explanation for why it works so well, but it does so I'm not
>>going to complain.
>>
>>If your images are only 30KiB, you should have no problems.
>>
>>Here's a hint though - make the BLOB columns the last column in
>>your tables.  Or even store the BLOBs in a separate table which
>>only has two columns: an integer primary key and the blob itself,
>>and then access the BLOB content using a join if you need to.
>>If you put various small integer fields after the BLOB, then
>>SQLite has to scan through the entire BLOB content (following
>>the linked list of disk pages) to get to the integer fields at
>>the end, and that definitely can slow you down.
>
>
> I still wonder about the utility of storing binary data in the db itself.
> Maybe it makes it more easy to distribute that way, but how often does one
> distribute an entire database in a "vendor specific" format?
>
> I'm quite interested in hearing people's reasoning for going the blob route,
> when you have a perfectly good "database" format for "blobs" already (various
> filesystems).

The BLOB method has two major advantages.  The first is that the data
all reside in one file.  The second is that it does not have a directory
size limit, a curse when storing files and one which requires extra
logic to build a tree structure of directories or some other way of
limiting directory size.

The downside is the relative slowness of retrieving large BLOBs,
although the anecdotal evidence is that this is not as much of a problem
as one would expect.

On balance I would use BLOBs for binary storage except in the case where
there is a predominance of very large files.

>
>
>>--
>>D. Richard Hipp  <[hidden email]>
>>
>>
>>---------------------------------------------------------------------------
>>-- To unsubscribe, send email to [hidden email]
>>---------------------------------------------------------------------------
>>--
>
>
>
>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

12