Will ROWID be required to be unique in contentless Full text search

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Will ROWID be required to be unique in contentless Full text search

paul tracy
My question is whether or not I can rely on my use of non-unique rowids in FTS5 contentless tables to remain unenforced into the future of SQLite or am I exploiting an oversight that may be resolved in the future?
Details ...
I have a key-value pair SQLite table (called props) used for storing properties of objects. It has the following columns: objID, propCode, propVal
objID + propCode is my primary key.

I have created a contentless FTS5 virtual table as follows: create virtual table fti using FTS5(ftiCode, ftiVal, content='');
for every record in props table a corresponding insert is made to the fti table, I insert the props.objID into fti.rowid.
Since any given objID in my props table can have multiple props I wind up making multiple inserts into the fti table with a non-unique rowid (since I'm using rowid to store the objid). This works perfectly fine in all use cases so my app runs perfectly. 
All of the docs on rowid say that it is unique but in the FTS5 contentless table this unique requirement is not enforced. Since I have to provide all 3 of the initial fields whenever a delete is required then this also works to remove only the exact record from FTI. I view the rowid in the FTS contentless table as simply holding whatever value I send to it and it will return that value in a field named rowid but that field isn't really a rowid in the true sense. If this interpretation will remain valid into the future then my implementation is perfect.

For the curious: Why am I doing this instead of something more traditional? Mostly it has to do with elimination of duplicate returns from the FTS5 search. But also contentless to save space. When I query the FT index I am interested in which objects contain given property values. My table has millions of rows so I'm sensitive to space and speed.
If I insert the following rows into fti rowid, ftiCode, ftiValue:
3, 7, A3, 8, B
and then I ... SELECT rowid FROM FTI where FTI MATCH A OR B … I only get one row returned with rowid of 3. This was a surprise but was exactly what I wanted. A more traditional approach to using FTI would have resulted in two rows returned both with the same ID field (object ID in my case). By exploiting the fact that the FTS5 does not enforce unique rowids I save myself the expense of using DISTINCT.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Will ROWID be required to be unique in contentless Full text search

Dan Kennedy-4

On 25/4/62 07:58, paul tracy wrote:

> My question is whether or not I can rely on my use of non-unique rowids in FTS5 contentless tables to remain unenforced into the future of SQLite or am I exploiting an oversight that may be resolved in the future?
> Details ...
> I have a key-value pair SQLite table (called props) used for storing properties of objects. It has the following columns: objID, propCode, propVal
> objID + propCode is my primary key.
>
> I have created a contentless FTS5 virtual table as follows: create virtual table fti using FTS5(ftiCode, ftiVal, content='');
> for every record in props table a corresponding insert is made to the fti table, I insert the props.objID into fti.rowid.
> Since any given objID in my props table can have multiple props I wind up making multiple inserts into the fti table with a non-unique rowid (since I'm using rowid to store the objid). This works perfectly fine in all use cases so my app runs perfectly.
> All of the docs on rowid say that it is unique but in the FTS5 contentless table this unique requirement is not enforced. Since I have to provide all 3 of the initial fields whenever a delete is required then this also works to remove only the exact record from FTI. I view the rowid in the FTS contentless table as simply holding whatever value I send to it and it will return that value in a field named rowid but that field isn't really a rowid in the true sense. If this interpretation will remain valid into the future then my implementation is perfect.
>
> For the curious: Why am I doing this instead of something more traditional? Mostly it has to do with elimination of duplicate returns from the FTS5 search. But also contentless to save space. When I query the FT index I am interested in which objects contain given property values. My table has millions of rows so I'm sensitive to space and speed.
> If I insert the following rows into fti rowid, ftiCode, ftiValue:
> 3, 7, A3, 8, B
> and then I ... SELECT rowid FROM FTI where FTI MATCH A OR B … I only get one row returned with rowid of 3. This was a surprise but was exactly what I wanted. A more traditional approach to using FTI would have resulted in two rows returned both with the same ID field (object ID in my case). By exploiting the fact that the FTS5 does not enforce unique rowids I save myself the expense of using DISTINCT.

Well, I can tell you that a single query on an FTS5 will never return
two rows with the same rowid. Or, more accurately, if it does it
indicates a bug that would affect all users, not just folks doing
unusual things. That's just the nature of the data structure.

Other things:

I think you should add the "columnsize=0" option to the CREATE VIRTUAL
TABLE statement. Without this option FTS5 creates a table that maps from
rowid to the size in tokens of each column in the associated row (in
your case, it will be named "fti_docsize"). This isn't going to work for
you - as the second time you insert a row with rowid=3 the size values
will clobber the values set when you inserted the first row with
rowid=3. It's only apps using specific ranking functions that benefit
from this extra data structure, so you can probably live without it:

https://sqlite.org/fts5.html#the_columnsize_option

You might also add the "detail=none" option, so that entire CREATE
VIRTUAL TABLE statement is:

   CREATE VIRTUAL TABLE fti USING fts5(ftiCode, ftiVal, content='',
columnsize=0, detail=none);

This option means that instead of storing offset information, FTS5 just
stores a list of rowids for each token. This might be better for you as
the offset information isn't going to work quite right either. Say you do:

   INSERT INTO fti VALUES(3, 'B', 'A B');
   INSERT INTO fti VALUSE(3, 3, 'B');

The second insert will clobber the offset information for "rowid=3,
token=B". So FTS5 will think that the only instance of "B" in the
rowid=3 document is token 0 of column "ftiVal". It will forget all about
the instances associated with the first insert of a rowid=3 row. But it
will remember that the rowid=3 document contains an 'A' as the first
token of column 'ftiVal'. If you set detail=none, then FTS5 won't store
any offset information and you won't have a problem. Like columnsize=0,
this option saves disk space too.

Good luck then,

Dan.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users