lol: sqlite3 db smaller than the sum of its contents...

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

lol: sqlite3 db smaller than the sum of its contents...

Stephan Beal-3
Hi, all!

Today i saw a curious thing: i store 440kb of wiki files in an sqlite3 db
and the db file is only 400kb.

HTF can that possibly be?

After poking around i found that the wiki files actually total 360kb (when i
added up their sizes manually, as opposed to using 'df' to get it), and the
extra 80kb were from the hard drive's large block size (slack space reported
by 'df').

Kinda funny, though, that sqlite3 actually decreases the amount of storage
required in this case.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: lol: sqlite3 db smaller than the sum of its contents...

Puneet Kishor-2



On Jun 23, 2011, at 10:18 AM, Stephan Beal <[hidden email]> wrote:

> Hi, all!
>
> Today i saw a curious thing: i store 440kb of wiki files in an sqlite3 db
> and the db file is only 400kb.
>
> HTF can that possibly be?
>
> After poking around i found that the wiki files actually total 360kb (when i
> added up their sizes manually, as opposed to using 'df' to get it), and the
> extra 80kb were from the hard drive's large block size (slack space reported
> by 'df').
>
> Kinda funny, though, that sqlite3 actually decreases the amount of storage
> required in this case.


Lots of small files will take up more space because of the fixed minimum block size. For large corpuses this won't matter. Putting them all in one db makes logistical management easier, but you will lose the ability to update just a single file individually. I used to store all my wiki files (punkish.org) in one SQLite db, but now I have them as separate files which allows me to just ssh and edit a single file easily. Six of one, and all that.


>
> --
> ----- stephan beal
> http://wanderinghorse.net/home/stephan/
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: lol: sqlite3 db smaller than the sum of its contents...

Jean-Denis Muys-2

On 23 juin 2011, at 16:22, Mr. Puneet Kishor wrote:

>
>
>
> On Jun 23, 2011, at 10:18 AM, Stephan Beal <[hidden email]> wrote:
>
>> Hi, all!
>>
>> Today i saw a curious thing: i store 440kb of wiki files in an sqlite3 db
>> and the db file is only 400kb.
>>
>> HTF can that possibly be?
>>
>> After poking around i found that the wiki files actually total 360kb (when i
>> added up their sizes manually, as opposed to using 'df' to get it), and the
>> extra 80kb were from the hard drive's large block size (slack space reported
>> by 'df').
>>
>> Kinda funny, though, that sqlite3 actually decreases the amount of storage
>> required in this case.
>
>
> Lots of small files will take up more space because of the fixed minimum block size. For large corpuses this won't matter. Putting them all in one db makes logistical management easier, but you will lose the ability to update just a single file individually. I used to store all my wiki files (punkish.org) in one SQLite db, but now I have them as separate files which allows me to just ssh and edit a single file easily. Six of one, and all that.
>

Let me add two other drawbacks as well:

- incremental backups: now everytime you change one small file, the whole database needs to be backed up, increasing needlessly storage size, and backup time. This applies to system that do versioning as well as backups (such as Time Machine).

- system level indexing: it now becomes much more difficult, if not impossible, to do system level indexing and searching (as eg in Spotlight). This is the reason why Apple stopped using a monolithic database for its email application, now storing each mail individually: so that system-wide user search can hit emails too.

These two drawbacks may or may not apply to your situation.

Jean-Denis

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: lol: sqlite3 db smaller than the sum of its contents...

Puneet Kishor-2


On Jun 23, 2011, at 10:49 AM, Jean-Denis Muys <[hidden email]> wrote:

>
> On 23 juin 2011, at 16:22, Mr. Puneet Kishor wrote:
>
>>
>>
>>
>> On Jun 23, 2011, at 10:18 AM, Stephan Beal <[hidden email]> wrote:
>>
>>> Hi, all!
>>>
>>> Today i saw a curious thing: i store 440kb of wiki files in an sqlite3 db
>>> and the db file is only 400kb.
>>>
>>> HTF can that possibly be?
>>>
>>> After poking around i found that the wiki files actually total 360kb (when i
>>> added up their sizes manually, as opposed to using 'df' to get it), and the
>>> extra 80kb were from the hard drive's large block size (slack space reported
>>> by 'df').
>>>
>>> Kinda funny, though, that sqlite3 actually decreases the amount of storage
>>> required in this case.
>>
>>
>> Lots of small files will take up more space because of the fixed minimum block size. For large corpuses this won't matter. Putting them all in one db makes logistical management easier, but you will lose the ability to update just a single file individually. I used to store all my wiki files (punkish.org) in one SQLite db, but now I have them as separate files which allows me to just ssh and edit a single file easily. Six of one, and all that.
>>
>
> Let me add two other drawbacks as well:
>
> - incremental backups: now everytime you change one small file, the whole database needs to be backed up, increasing needlessly storage size, and backup time. This applies to system that do versioning as well as backups (such as Time Machine).
>
> - system level indexing: it now becomes much more difficult, if not impossible, to do system level indexing and searching (as eg in Spotlight). This is the reason why Apple stopped using a monolithic database for its email application, now storing each mail individually: so that system-wide user search can hit emails too.


Yup. Very good points, both of them.

I still use the db for metadata, but my files are stored in a tree directory structure much like CPAN's directories -- /path/<1>/<12>/<123>/filename.txt where 1, 2, and 3 are the first, second and third letters of the filename. I could store the metadata per file within each file, however, I haven't yet found a way to "find the ten most recently edited files" or "find all files edited by <person name>".


>
> These two drawbacks may or may not apply to your situation.
>
> Jean-Denis
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: lol: sqlite3 db smaller than the sum of its contents...

Stephan Beal-3
In reply to this post by Jean-Denis Muys-2
On Thu, Jun 23, 2011 at 4:49 PM, Jean-Denis Muys <[hidden email]>wrote:

> Let me add two other drawbacks as well:
>
> - incremental backups: now everytime you change one small file, the whole
> database needs to be backed up, increasing needlessly storage size, and
> backup time. This applies to system that do versioning as well as backups
> (such as Time Machine).
>

In this case that's not the issue - the application using the db _is_ the
wiki back-end/manager, so the pages have to be in the db. i only have them
as files because i imported them from a GoogleCode project.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users