Properly bulk-inserting into FTS5 index with external content table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Properly bulk-inserting into FTS5 index with external content table

Eugene Mirotin
Hi,

I have a table questions with lots of columns. Out of them 6 columns
represent text content and the rest are technical (FKs, update timestamp,
etc)

I need to build the FTS5 search index for this table to only index the
content columns (that 6).
The DB is only built once on the server and later used as read-only in the
client app (which downloads it from the server)
It works fine unless I try to use the external content table feature (which
I need to reduce the table size and also to be able to make JOIN queries to
get the meta info from the original table for each matching row.

The original table (called questions) has the PK column called id. This is
the column I want to use as rowid for the search index and for joining the
tables.

Here's how I'm trying to create and populate the search table:

DROP TABLE IF EXISTS search;
CREATE VIRTUAL TABLE search USING fts5(question, answer, altAnswers,
comments, authors, sources, tokenize = "snowball russian english
unicode61", content=questions, content_rowid=id);
INSERT INTO search SELECT id, question, answer, altAnswers, comments,
authors, sources FROM questions WHERE obsolete IS NULL;

At this point INSERT fails because I'm trying to insert 7 columns into the
table which has 6.

I also tried creating id as UNINDEXED column in the search table and that
passed but the search results were silly, matching completely unrelated
tokens.

So what is the proper way to insert all rows from questions into the search
table?


Thanks in advance
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Properly bulk-inserting into FTS5 index with external content table

Eugene Mirotin
Hmm, I think I've found the solution:

INSERT INTO search (rowid, question, answer, altAnswers, comments, authors,
sources) SELECT id, question, answer, altAnswers, comments, authors,
sources FROM questions WHERE obsolete IS NULL;


On Tue, Oct 31, 2017 at 1:02 PM Eugene Mirotin <[hidden email]> wrote:

> Hi,
>
> I have a table questions with lots of columns. Out of them 6 columns
> represent text content and the rest are technical (FKs, update timestamp,
> etc)
>
> I need to build the FTS5 search index for this table to only index the
> content columns (that 6).
> The DB is only built once on the server and later used as read-only in the
> client app (which downloads it from the server)
> It works fine unless I try to use the external content table feature
> (which I need to reduce the table size and also to be able to make JOIN
> queries to get the meta info from the original table for each matching row.
>
> The original table (called questions) has the PK column called id. This is
> the column I want to use as rowid for the search index and for joining the
> tables.
>
> Here's how I'm trying to create and populate the search table:
>
> DROP TABLE IF EXISTS search;
> CREATE VIRTUAL TABLE search USING fts5(question, answer, altAnswers,
> comments, authors, sources, tokenize = "snowball russian english
> unicode61", content=questions, content_rowid=id);
> INSERT INTO search SELECT id, question, answer, altAnswers, comments,
> authors, sources FROM questions WHERE obsolete IS NULL;
>
> At this point INSERT fails because I'm trying to insert 7 columns into the
> table which has 6.
>
> I also tried creating id as UNINDEXED column in the search table and that
> passed but the search results were silly, matching completely unrelated
> tokens.
>
> So what is the proper way to insert all rows from questions into the
> search table?
>
>
> Thanks in advance
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Properly bulk-inserting into FTS5 index with external content table

Simon Slavin-3


On 31 Oct 2017, at 10:21am, Eugene Mirotin <[hidden email]> wrote:

> Hmm, I think I've found the solution:
>
> INSERT INTO search (rowid, question, answer, altAnswers, comments, authors,
> sources) SELECT id, question, answer, altAnswers, comments, authors,
> sources FROM questions WHERE obsolete IS NULL;

That looks like it should work, and the "INSERT … SELECT" form is very fast.

I do have a suggestion for something else: don’t use intentionally NULL values.  NULL means "unknown" or "no value" in SQL whereas what you mean is more like TRUE and FALSE.

SQLite doesn’t have a BOOLEAN type.  Instead the fastest, most compact way to store booleans is to use 0 and 1.  And if you store those values you can do things like

        … WHERE obsolete — equivalent to WHERE obsolete = 1
        … WHERE NOT obsolete — equivalent to WHERE obsolete = 0

Unfortunately TRUE and FALSE are not reserved words, so you do have to do things like

CREATE TABLE questions (…, obsolete INTEGER DEFAULT 0, …).

Using specific values 0 and 1 means that if you have a fault in your software and end up with NULL values in your fields you know something definitely went wrong.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Properly bulk-inserting into FTS5 index with external content table

Eugene Mirotin
Thanks for the tip.
This field is actually boolean, but also nullable. From other languages
(like Python and JS) I actually assumed the NULL check is faster than value
comparison.
Will improve later (for now it's definitely not a bottleneck).

On Tue, Oct 31, 2017 at 3:50 PM Simon Slavin <[hidden email]> wrote:

>
>
> On 31 Oct 2017, at 10:21am, Eugene Mirotin <[hidden email]> wrote:
>
> > Hmm, I think I've found the solution:
> >
> > INSERT INTO search (rowid, question, answer, altAnswers, comments,
> authors,
> > sources) SELECT id, question, answer, altAnswers, comments, authors,
> > sources FROM questions WHERE obsolete IS NULL;
>
> That looks like it should work, and the "INSERT … SELECT" form is very
> fast.
>
> I do have a suggestion for something else: don’t use intentionally NULL
> values.  NULL means "unknown" or "no value" in SQL whereas what you mean is
> more like TRUE and FALSE.
>
> SQLite doesn’t have a BOOLEAN type.  Instead the fastest, most compact way
> to store booleans is to use 0 and 1.  And if you store those values you can
> do things like
>
>         … WHERE obsolete        — equivalent to WHERE obsolete = 1
>         … WHERE NOT obsolete    — equivalent to WHERE obsolete = 0
>
> Unfortunately TRUE and FALSE are not reserved words, so you do have to do
> things like
>
> CREATE TABLE questions (…, obsolete INTEGER DEFAULT 0, …).
>
> Using specific values 0 and 1 means that if you have a fault in your
> software and end up with NULL values in your fields you know something
> definitely went wrong.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Properly bulk-inserting into FTS5 index with external content table

Simon Slavin-3


On 31 Oct 2017, at 12:54pm, Eugene Mirotin <[hidden email]> wrote:

> This field is actually boolean, but also nullable. From other languages
> (like Python and JS) I actually assumed the NULL check is faster than value
> comparison.

In SQLite, NULL, 0 and 1 are all special cases and take the same amount of filespace/memory to store.

> Will improve later (for now it's definitely not a bottleneck).

Agreed.  If that bit of your program is already in place, and you understand the problems of doing logic with NULL values, then it’s not a priority.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users