Compression for ft5

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Compression for ft5

Domingo Alvarez Duarte
Hello !

After looking at how compression is implemented in fts3 and wanting the
same for fts5 I managed to get a working implementation that I'm sharing
here with the same license as sqlite3 in hope it can be useful to others
and maybe be added to sqlite3.

Cheers !


Here is on implementation of optional compression and min_word_size for
columns in fts5:

===

create virtual table if not exists docs_fts using fts5(
     doc_fname unindexed, doc_data compressed,
     compress=compress, uncompress=uncompress,
     tokenize = 'unicode61 min_word_size=3'
);

===

https://gist.github.com/mingodad/7fdec8eebdde70ee388db60855760c72


And here is an implementation of optional compression for columns in fts3/4:

===

create virtual table if not exists docs_fts using fts4(
     doc_fname, doc_data,
     tokenize = 'unicode61',
     notindexed=doc_fname, notcompressed=doc_fname,
     compress=compress, uncompress=uncompress
);

===

https://gist.github.com/mingodad/2f05cd1280d58f93f89133b2a2011a4d

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Compression for ft5

wmertens
This is really cool, thanks for sharing!

I wonder though, is the compression done per field? I read the source but I
couldn't figure it out quickly (not really used to the sqlite codebase).
What are the compression ratios you achieve?


Wout.


On Mon, Sep 24, 2018 at 3:58 PM Domingo Alvarez Duarte <[hidden email]>
wrote:

> Hello !
>
> After looking at how compression is implemented in fts3 and wanting the
> same for fts5 I managed to get a working implementation that I'm sharing
> here with the same license as sqlite3 in hope it can be useful to others
> and maybe be added to sqlite3.
>
> Cheers !
>
>
> Here is on implementation of optional compression and min_word_size for
> columns in fts5:
>
> ===
>
> create virtual table if not exists docs_fts using fts5(
>      doc_fname unindexed, doc_data compressed,
>      compress=compress, uncompress=uncompress,
>      tokenize = 'unicode61 min_word_size=3'
> );
>
> ===
>
> https://gist.github.com/mingodad/7fdec8eebdde70ee388db60855760c72
>
>
> And here is an implementation of optional compression for columns in
> fts3/4:
>
> ===
>
> create virtual table if not exists docs_fts using fts4(
>      doc_fname, doc_data,
>      tokenize = 'unicode61',
>      notindexed=doc_fname, notcompressed=doc_fname,
>      compress=compress, uncompress=uncompress
> );
>
> ===
>
> https://gist.github.com/mingodad/2f05cd1280d58f93f89133b2a2011a4d
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Compression for ft5

Domingo Alvarez Duarte
Hello !

Yes you are right the compression need to be defined by each field that
you want to be compressed, I did it because I need some fields that the
general size do not justify the overhead of the compression.

Cheers !

On 25/09/2018 14:29, Wout Mertens wrote:

> This is really cool, thanks for sharing!
>
> I wonder though, is the compression done per field? I read the source but I
> couldn't figure it out quickly (not really used to the sqlite codebase).
> What are the compression ratios you achieve?
>
>
> Wout.
>
>
> On Mon, Sep 24, 2018 at 3:58 PM Domingo Alvarez Duarte <[hidden email]>
> wrote:
>
>> Hello !
>>
>> After looking at how compression is implemented in fts3 and wanting the
>> same for fts5 I managed to get a working implementation that I'm sharing
>> here with the same license as sqlite3 in hope it can be useful to others
>> and maybe be added to sqlite3.
>>
>> Cheers !
>>
>>
>> Here is on implementation of optional compression and min_word_size for
>> columns in fts5:
>>
>> ===
>>
>> create virtual table if not exists docs_fts using fts5(
>>       doc_fname unindexed, doc_data compressed,
>>       compress=compress, uncompress=uncompress,
>>       tokenize = 'unicode61 min_word_size=3'
>> );
>>
>> ===
>>
>> https://gist.github.com/mingodad/7fdec8eebdde70ee388db60855760c72
>>
>>
>> And here is an implementation of optional compression for columns in
>> fts3/4:
>>
>> ===
>>
>> create virtual table if not exists docs_fts using fts4(
>>       doc_fname, doc_data,
>>       tokenize = 'unicode61',
>>       notindexed=doc_fname, notcompressed=doc_fname,
>>       compress=compress, uncompress=uncompress
>> );
>>
>> ===
>>
>> https://gist.github.com/mingodad/2f05cd1280d58f93f89133b2a2011a4d
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users