compressed sqlite3 database file?

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

compressed sqlite3 database file?

Peng Yu
I have some TSV table in .gz format of only 278MB. But the
corresponding sqlite3 database exceeds 1.58GB (without any index). Is
there a way to make the database file of a size comparable (at least
not over 5 times) to the original TSV table in the .gz file? Thanks.

--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: compressed sqlite3 database file?

wmertens
I know of two options: The proprietary
https://sqlite.org/zipvfs/doc/trunk/www/readme.wiki and this extension that
you have to call on strings yourself:
https://github.com/siara-cc/Shox96_Sqlite_UDF

Furthermore, some filesystems allow transparent compression, like ntfs,
bcachefs, zfs and btrfs. I have a 13GB DB that takes up 850MB on btrfs.

Wout.


On Wed, Apr 10, 2019 at 7:39 AM Peng Yu <[hidden email]> wrote:

> I have some TSV table in .gz format of only 278MB. But the
> corresponding sqlite3 database exceeds 1.58GB (without any index). Is
> there a way to make the database file of a size comparable (at least
> not over 5 times) to the original TSV table in the .gz file? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Hick Gunter
In reply to this post by Peng Yu
Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in any way human readable without uncompressing it first. How big is the result if you compress the sqlite file through gzip?

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Peng Yu
Gesendet: Mittwoch, 10. April 2019 07:39
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] [sqlite] compressed sqlite3 database file?

I have some TSV table in .gz format of only 278MB. But the corresponding sqlite3 database exceeds 1.58GB (without any index). Is there a way to make the database file of a size comparable (at least not over 5 times) to the original TSV table in the .gz file? Thanks.

--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Simon Slavin-3
On 10 Apr 2019, at 7:33am, Hick Gunter <[hidden email]> wrote:

> Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in any way human readable without uncompressing it first. How big is the result if you compress the sqlite file through gzip?

Or how big is the TSV file you get if you expand your .gz file ?

The development team for SQLite make a 'Compressed and Encrypted Read-Only Database (CEROD) Extension' which is not free and requires a US$2000 license.  It can be used only if the data you want compressed is going to be read-only.

<https://www.hwaci.com/sw/sqlite/cerod.html>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Peng Yu
> > Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in any way human readable without uncompressing it first.

But to store the file (and occasionally search the data), I would
prefer 278MB instead of 1.5GB. With a .gz file, at least I can zcat
and zgrep.

> How big is the result if you compress the sqlite file through gzip?

620MB

> Or how big is the TSV file you get if you expand your .gz file ?

1.17GB
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: compressed sqlite3 database file?

Keith Medcalf
In reply to this post by Peng Yu

The disassembled bird will always require more tar to coat than the assembled bird.  This is because the disassembled bird will have a greater surface area to coat with tar than the assembled bird.  This is a fact of physics which, although you may try as you might, you cannot change (unless of course you are Q in which case you may simply change the gravimetric constant of the universe).

You should probably perform "data normalization" on the contents of your file.  This is likely to result in the greatest reduction in space used that anything else in the multiverse ...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Peng Yu
>Sent: Tuesday, 9 April, 2019 23:39
>To: SQLite mailing list
>Subject: [sqlite] compressed sqlite3 database file?
>
>I have some TSV table in .gz format of only 278MB. But the
>corresponding sqlite3 database exceeds 1.58GB (without any index). Is
>there a way to make the database file of a size comparable (at least
>not over 5 times) to the original TSV table in the .gz file? Thanks.
>
>--
>Regards,
>Peng
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Peng Yu
In reply to this post by Peng Yu
Here is the runtime of using "select where like" (with %) on a .sq3 file.

real 0m23.105s
user 0m12.765s
sys 0m2.882s

Here is the runtime of zgrep (roughly equivalent, except that zgrep
search for the whole line).

real 0m33.814s
user 0m40.927s
sys 0m0.660s

Given the much larger disk space required, for an occasional search of
the data, it seems that it makes no sense to use sqlite3 if disk space
is a major concern.

On 4/10/19, Peng Yu <[hidden email]> wrote:

>> > Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in
>> > any way human readable without uncompressing it first.
>
> But to store the file (and occasionally search the data), I would
> prefer 278MB instead of 1.5GB. With a .gz file, at least I can zcat
> and zgrep.
>
>> How big is the result if you compress the sqlite file through gzip?
>
> 620MB
>
>> Or how big is the TSV file you get if you expand your .gz file ?
>
> 1.17GB
>


--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Hick Gunter
I have the distinct impression that you are attempting to convert a flat file into a naked table and pretending that the result is a (relational) database.

Please rethink your approach. There is a design process called "normalization" that needs to be done first. This will identify "entities" (with "attributes") and "relations" that will greatly reduce data duplication found in flat files.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Peng Yu
Gesendet: Mittwoch, 10. April 2019 15:03
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

Here is the runtime of using "select where like" (with %) on a .sq3 file.

real    0m23.105s
user    0m12.765s
sys     0m2.882s
data from a
Here is the runtime of zgrep (roughly equivalent, except that zgrep search for the whole line).

real    0m33.814s
user    0m40.927s
sys     0m0.660s

Given the much larger disk space required, for an occasional search of the data, it seems that it makes no sense to use sqlite3 if disk space is a major concern.

On 4/10/19, Peng Yu <[hidden email]> wrote:

>> > Your comparison is unfair. A .gz file is COMPRESSED by gzip and not
>> > in any way human readable without uncompressing it first.
>
> But to store the file (and occasionally search the data), I would
> prefer 278MB instead of 1.5GB. With a .gz file, at least I can zcat
> and zgrep.
>
>> How big is the result if you compress the sqlite file through gzip?
>
> 620MB
>
>> Or how big is the TSV file you get if you expand your .gz file ?
>
> 1.17GB
>


--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Peng Yu
I don't know specifically what you refer to as data normalization. My
guess is something like this. But it is irrelevant to my case.

https://www.studytonight.com/dbms/database-normalization.php

For my specific TSV file, it has about 50 million rows and just two
columns. The first column is of strings and the second column is of
integers. All the strings in the first column are unique (some strings
may be substrings of other strings though).

On 4/10/19, Hick Gunter <[hidden email]> wrote:
> I have the distinct impression that you are attempting to convert a flat
> file into a naked table and pretending that the result is a (relational)
> database.
>
> Please rethink your approach. There is a design process called
> "normalization" that needs to be done first. This will identify "entities"
> (with "attributes") and "relations" that will greatly reduce data
> duplication found in flat files.

--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Keith Medcalf

The first column is of strings ...

Do you mean a single string as in "KerfufledAllaHasbalah"
Or a "bunch of strings with some implied delimiter" such as "Kerfufled/Alla/Hasballah" where "/" is the separator between strings?

If the latter, the data needs to be normalized.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Peng Yu
>Sent: Wednesday, 10 April, 2019 08:01
>To: SQLite mailing list
>Subject: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?
>
>I don't know specifically what you refer to as data normalization. My
>guess is something like this. But it is irrelevant to my case.
>
>https://www.studytonight.com/dbms/database-normalization.php
>
>For my specific TSV file, it has about 50 million rows and just two
>columns. The first column is of strings and the second column is of
>integers. All the strings in the first column are unique (some
>strings
>may be substrings of other strings though).
>
>On 4/10/19, Hick Gunter <[hidden email]> wrote:
>> I have the distinct impression that you are attempting to convert a
>flat
>> file into a naked table and pretending that the result is a
>(relational)
>> database.
>>
>> Please rethink your approach. There is a design process called
>> "normalization" that needs to be done first. This will identify
>"entities"
>> (with "attributes") and "relations" that will greatly reduce data
>> duplication found in flat files.
>
>--
>Regards,
>Peng
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Keith Medcalf
In reply to this post by Peng Yu

the second column is of integers ...

Do you mean the second column in AN integer or that it is a bunch-o-integers separated by some separator?

If the latter, normalization is required.


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Peng Yu
>Sent: Wednesday, 10 April, 2019 08:01
>To: SQLite mailing list
>Subject: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?
>
>I don't know specifically what you refer to as data normalization. My
>guess is something like this. But it is irrelevant to my case.
>
>https://www.studytonight.com/dbms/database-normalization.php
>
>For my specific TSV file, it has about 50 million rows and just two
>columns. The first column is of strings and the second column is of
>integers. All the strings in the first column are unique (some
>strings
>may be substrings of other strings though).
>
>On 4/10/19, Hick Gunter <[hidden email]> wrote:
>> I have the distinct impression that you are attempting to convert a
>flat
>> file into a naked table and pretending that the result is a
>(relational)
>> database.
>>
>> Please rethink your approach. There is a design process called
>> "normalization" that needs to be done first. This will identify
>"entities"
>> (with "attributes") and "relations" that will greatly reduce data
>> duplication found in flat files.
>
>--
>Regards,
>Peng
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Peng Yu
In reply to this post by Keith Medcalf
On 4/10/19, Keith Medcalf <[hidden email]> wrote:
>
> The first column is of strings ...
>
> Do you mean a single string as in "KerfufledAllaHasbalah"
> Or a "bunch of strings with some implied delimiter" such as
> "Kerfufled/Alla/Hasballah" where "/" is the separator between strings?
>
> If the latter, the data needs to be normalized.

There is only one string in each row, not many strings separated by
some separator. A string can include white space characters. In other
words, a string can be a word or multiple words separated by white
spaces.

Would it be possible to create indexes at a subcolumn level? I think
if indexes can be created at the word level (or prefix of words, as I
frequently search for things like a noun and its plural form
together), then the search can be faster.

--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Graham Holden
In reply to this post by Peng Yu
Wednesday, April 10, 2019, 2:03:24 PM, Peng Yu <[hidden email]> wrote:

> Given the much larger disk space required, for an occasional search of
> the data, it seems that it makes no sense to use sqlite3 if disk space
> is a major concern.

Whether it "makes sense" to use SQLite or not, probably only you can
decide. If the "zgrep" solution works, and is "fast enough", then
there may not be any point in using SQLite. However, possible
advantages (depending on your "use cases") include being able to
ENFORCE the uniqueness of the string column, and -- depending on the
types of query involved -- benefiting from an index which in many
cases _should_ speed up the search.

And, in response to a couple of your latter emails (where you give
_some_ details about your data): you will have a much better chance
of people on this list being able to help you if you give a more
complete description of the problem: include some sample data from
your TSV and the types of searches you want to do. People can then
give much more informed help.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Hick Gunter
In reply to this post by Peng Yu
"a string can be ... multiple words separated by white spaces"
"...indexes at subcolumn level..."
"... search for a noun and ist plural form together..."

Yes, you do need normalization. You need to divulge "what you are trying to do" instead of asking "how to emulate a non-relational implementation".

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Peng Yu
Gesendet: Mittwoch, 10. April 2019 16:12
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

On 4/10/19, Keith Medcalf <[hidden email]> wrote:
>
> The first column is of strings ...
>
> Do you mean a single string as in "KerfufledAllaHasbalah"
> Or a "bunch of strings with some implied delimiter" such as
> "Kerfufled/Alla/Hasballah" where "/" is the separator between strings?
>
> If the latter, the data needs to be normalized.

There is only one string in each row, not many strings separated by some separator. A string can include white space characters. In other words, a string can be a word or multiple words separated by white spaces.

Would it be possible to create indexes at a subcolumn level? I think if indexes can be created at the word level (or prefix of words, as I frequently search for things like a noun and its plural form together), then the search can be faster.

--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] compressed sqlite3 database file?

Keith Medcalf
In reply to this post by Peng Yu

On Wednesday, 10 April, 2019 08:12, Peng Yu <[hidden email]> wrote:

>On 4/10/19, Keith Medcalf <[hidden email]> wrote:

>> The first column is of strings ...

>> Do you mean a single string as in "KerfufledAllaHasbalah"
>> Or a "bunch of strings with some implied delimiter" such as
>> "Kerfufled/Alla/Hasballah" where "/" is the separator between
>> strings?
>>
>> If the latter, the data needs to be normalized.

>There is only one string in each row, not many strings separated by
>some separator. A string can include white space characters. In other
>words, a string can be a word or multiple words separated by white
>spaces.

>Would it be possible to create indexes at a subcolumn level? I think
>if indexes can be created at the word level (or prefix of words, as I
>frequently search for things like a noun and its plural form
>together), then the search can be faster.

There would be a number of ways that you can do this, including using the built-in FTS module.  The "string" column is your free-form text.  Be aware, however, that this will create and maintain a number of additional indexes on the data and will consume *more* disk space to hold the data and indexes, however search time will likely decrease from your current 23 seconds to about say 23 milliseconds for an appropriately phrased query yeilding similar results.  (ie, the search will be 1000x faster but the space consumed may be only two or three times as much).

You could also build your own "word" list by parsing the strings and linking them to the record to which they pertain, then use appropriately phrased queries to perform lightening fast searches.  This is basically what FTS does, only you are doing it yourself.


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.





_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: compressed sqlite3 database file?

Warren Young
In reply to this post by Peng Yu
On Apr 9, 2019, at 11:39 PM, Peng Yu <[hidden email]> wrote:
>
> Is there a way to make the database file of a size comparable (at least
> not over 5 times) to the original TSV table in the .gz file?

Transparent file compression is a feature of several filesystems: NTFS, ZFS, Btrfs, and more:

   https://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies

If you can enable this feature on your existing system or switch to one of the filesystems that do support it, you don’t need a non-default SQLite configuration.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: compressed sqlite3 database file?

wmertens
As I said in my previous email, I have a 13GB database that transparently
compresses to 800MB. Not sure if it got through, didn't get replies to my
last two emails.

Wout.

On Wed., Apr. 10, 2019, 5:04 p.m. Warren Young <[hidden email]> wrote:

> On Apr 9, 2019, at 11:39 PM, Peng Yu <[hidden email]> wrote:
> >
> > Is there a way to make the database file of a size comparable (at least
> > not over 5 times) to the original TSV table in the .gz file?
>
> Transparent file compression is a feature of several filesystems: NTFS,
> ZFS, Btrfs, and more:
>
>
> https://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies
>
> If you can enable this feature on your existing system or switch to one of
> the filesystems that do support it, you don’t need a non-default SQLite
> configuration.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: compressed sqlite3 database file?

Peter da Silva-2
In reply to this post by Peng Yu
A database trades off space for performance and functionality. It is
expected that a database with indexes that you can randomly access is going
to take more space than the raw data, let alone a compressed version of the
raw data.

On Wed, Apr 10, 2019 at 12:39 AM Peng Yu <[hidden email]> wrote:

> I have some TSV table in .gz format of only 278MB. But the
> corresponding sqlite3 database exceeds 1.58GB (without any index). Is
> there a way to make the database file of a size comparable (at least
> not over 5 times) to the original TSV table in the .gz file? Thanks.
>
> --
> Regards,
> Peng
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: compressed sqlite3 database file?

Peng Yu
In reply to this post by wmertens
https://softwarerecs.stackexchange.com/questions/45010/transparent-file-compression-apps-for-macos

I work on Mac. Would this be worthwhile to try?  Does the transparent
compression work at the file system level or at the directory level? Would
it have a slight chance to corrupt the existent files on the disk (e.g.,
power outrage during compression)?

On Wed, Apr 10, 2019 at 11:02 AM Wout Mertens <[hidden email]>
wrote:

> As I said in my previous email, I have a 13GB database that transparently
> compresses to 800MB. Not sure if it got through, didn't get replies to my
> last two emails.
>
> Wout.
>
> On Wed., Apr. 10, 2019, 5:04 p.m. Warren Young <[hidden email]> wrote:
>
> > On Apr 9, 2019, at 11:39 PM, Peng Yu <[hidden email]> wrote:
> > >
> > > Is there a way to make the database file of a size comparable (at least
> > > not over 5 times) to the original TSV table in the .gz file?
> >
> > Transparent file compression is a feature of several filesystems: NTFS,
> > ZFS, Btrfs, and more:
> >
> >
> >
> https://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies
> >
> > If you can enable this feature on your existing system or switch to one
> of
> > the filesystems that do support it, you don’t need a non-default SQLite
> > configuration.
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
--
Regards,
Peng
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: compressed sqlite3 database file?

Warren Young
On Apr 10, 2019, at 12:08 PM, Peng Yu <[hidden email]> wrote:
>
> https://softwarerecs.stackexchange.com/questions/45010/transparent-file-compression-apps-for-macos
>
> I work on Mac. Would this be worthwhile to try?

The first link didn’t work here because it didn’t like the APFS drive I tried it on.  (Symptom: “Expecting f_type of 17, 23 or 24. f_type is 26.”)

I then tried the so-called “Github mirror”, which is no such thing: it’s considerably advanced beyond the last version published at the first link, and one of the improvements is APFS awareness.

Using that improved version, simple tests then worked, but then attempting to use it on a SQLite DB file uncompressed it and left it uncompressed.  I believe this is because this OS feature relies on the old resource fork feature, which means it only works with apps using the Apple-proprietary programming interfaces, not POSIX interfaces, as SQLite does.

> Does the transparent
> compression work at the file system level or at the directory level?

Neither: it works at the file level.

You can point afsctool at a directory and it will compress the files in that directory, but if you then drop another file in that directory, it won’t automatically be compressed.

The tool will also skip over files it considers “already compressed” unless you give it the -L flag, so giving it a directory name isn’t guaranteed to result in all files in that directory being compressed.

> Would
> it have a slight chance to corrupt the existent files on the disk (e.g.,
> power outrage during compression)?

Between the resource forks issue and the fact that we’re having to use a third party tool to enable it, I wouldn’t put much trust in this feature.

If you want to put your trust in a third-party OS add-on, O3X is worth much more of your attention than afsctool:

    https://openzfsonosx.org/

If you don’t have a spare disk to feed to it, you can create a pool using raw disk images:

   https://openzfsonosx.org/wiki/FAQ#Q.29_Can_I_set_up_a_test_pool_using_files_instead_of_disks.3F

ZFS’s abilities to a) add vdevs to a pool; and b) replace smaller vdevs with larger ones together mean you can safely set the initial pool size to just barely larger than the initial DB file plus any ancillary space needed.  (WAL, slack pages between VACUUM calls, etc.)  You can then grow the pool occasionally to keep ahead of the growth of the DB without sacrificing too much on filesystem overhead.  It’d be easy to write an on-demand 20% pool size growth script, for instance; it’d be maybe half a dozen lines of Bash.

Lest you go off on an unfortunate tangent from this idea, note that the “compressed disk image” feature of Disk Utility won’t help you here.  Those are read-only.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12