Strange behavior with fts4

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange behavior with fts4

Philip Bennefall
I hate to be cluttering up the list in this fashion, but I have come across an issue that I cannot seem to find a solution for.

I am using two fts tables, one that uses the normal tokenizer and another that uses the porter stemmer, so that I can search the same dataset with and without porter. For the porter stemmer table, I have set the content option to point to the other fts table. Like this:

CREATE VIRTUAL TABLE main_brain USING fts4(phrase, response, matchinfo=fts3, tokenize=simple, order=desc);

CREATE VIRTUAL TABLE porter_brain USING fts4(content="main_brain", phrase, response, matchinfo=fts3, tokenize=porter, order=desc);

 Then I do the following:

INSERT INTO main_brain(phrase, response) VALUES('hello', 'Hi there!');

So you'd now think that main_brain should have this content in it, but the porter_brain table should be empty. At least, that is what the documentation on fts4 seems to indicate as it points out that it is my own responsibility to make sure the tables are in sync. But:

SELECT * FROM porter_brain;

Produces:

hello|Hi there!

And:

SELECT * FROM main_brain;

Gives the same result. So it seems as though some internal synchronization between these tables is going on. How should I handle this? The documentation suggests a statement like the following in one of its examples:

INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;

Where I guess t3 would correspond to porter_brain and t2 would be main_brain in my case. But I don't seem to need to do this at all.

Can anyone throw some light on this?

Thanks in advance.

Kind regards,

Philip Bennefall
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange behavior with fts4

Yuriy M. Kaminskiy
Philip Bennefall wrote:

> I hate to be cluttering up the list in this fashion, but I have come across
> an issue that I cannot seem to find a solution for.
>
> I am using two fts tables, one that uses the normal tokenizer and another
> that uses the porter stemmer, so that I can search the same dataset with and
> without porter. For the porter stemmer table, I have set the content option
> to point to the other fts table. Like this:
>
> CREATE VIRTUAL TABLE main_brain USING fts4(phrase, response, matchinfo=fts3,
> tokenize=simple, order=desc);
>
> CREATE VIRTUAL TABLE porter_brain USING fts4(content="main_brain", phrase,
> response, matchinfo=fts3, tokenize=porter, order=desc);
>
> Then I do the following:
>
> INSERT INTO main_brain(phrase, response) VALUES('hello', 'Hi there!');
>
> So you'd now think that main_brain should have this content in it, but the
> porter_brain table should be empty. At least, that is what the documentation
> on fts4 seems to indicate as it points out that it is my own responsibility
> to make sure the tables are in sync. But:
>
> SELECT * FROM porter_brain;
>
> Produces:
>
> hello|Hi there!

As porter_brain takes all content from main_brain, full scan result looks
"correct" ("SELECT FROM porter_brain" internally replaced with "SELECT FROM
main_brain"). But before you execute INSERT below, all fts indexes are missing,
so any request that utilize them, like

SELECT * FROM porter_brain WHERE phrase MATCH 'hello';

will fail.

> And:
>
> SELECT * FROM main_brain;
>
> Gives the same result. So it seems as though some internal synchronization
> between these tables is going on. How should I handle this? The documentation
>  suggests a statement like the following in one of its examples:
>
> INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;

> Where I guess t3 would correspond to porter_brain and t2 would be main_brain
> in my case. But I don't seem to need to do this at all.

INSERT INTO porter_brain (docid, phrase, response)
                   SELECT docid, phrase, response FROM main_brain;

Before you execute this insert statement, porter_brain state is *inconsistent*
and it cannot be used properly.

[FWIW, most natural place for this insert would be AFTER INSERT trigger ... but
as main_brain is VIRTUAL TABLE, it is impossible; you can consider using
external content table for both virtual tables with trigger:

CREATE TABLE tblContent(phrase TEXT, response TEXT);
CREATE VIRTUAL TABLE main_brain USING fts4(content="tblContent", phrase,
response, matchinfo=fts3, tokenize=simple, order=desc);
CREATE VIRTUAL TABLE porter_brain USING fts4(content="tblContent", phrase,
response, matchinfo=fts3, tokenize=porter, order=desc);
CREATE TRIGGER trgSync AFTER INSERT ON tblContent FOR EACH ROW
 BEGIN
      INSERT INTO main_brain (docid, phrase, response)
           VALUES (NEW.rowid, NEW.phrase, NEW.response);
      INSERT INTO porter_brain (docid, phrase, response)
           VALUES (NEW.rowid, NEW.phrase, NEW.response);
   END;
].

> Can anyone throw some light on this?
>
> Thanks in advance.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users