Problem: FTS5 prefix search getting the mis-match result

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Problem: FTS5 prefix search getting the mis-match result

ze tian
When I use FTS5, I have met that, there are some cases which will get mis-match results with prefix search.
Like "select * from tbl_fts where tbl_fts match 'lucy*';",which I want to get records like "lucya","lucyabc" etc, and
"lux" or "lulu" is not what I want but returned.
Such problems are not common, But I have tried to build such test case which can lead to this problem very easy. Here is how I generate it:
1) create an fts5 table. and insert some record like "lucya","lucyb".
2) prepare some records: a) lusheng b)lulu; c)lunix; d)luma; e) pengyu.  a,b,c,d are have some same prefix(lu), e is some other random case.
3) before insert into the fts table with 2) records, appending some random letter to make each record different.
   Like: "lulu","luluabc","luluefg", also "lunix","lunixabc",etc
4) for-loop insert, and each loop trying to lantch the query 'lucy*', \
check the match result will finding the mis-match result, the corrent results should be "lucya","lucyb", not "luluabc"...

When mis-match happen, I try to analyze the prefix search mechanism and find that, there are 2 points which I think have problems:
1) fts5LeafSeek, when search failed, and exec goto search_failed, in search_failed, the 2 if condition will not satisfy commonly. In my mind, I think it should return,
but not, and then the search_success logic exec.
2) fts5SetupPrefixIter, when gather results, the logic to set the flag bNewTerm has some leak, which will set bNewTerm=false,
but the record is not what we want indeed.

These 2 logic problems lead to mis-match results. I try to remove the bNewTerm logic directly, and make it compare every loog,
then, the mis-match results disappear.
// relevant code
Change below
if (bNewTerm) {
if (nTerm < nToken || memcmp(pToken, pTerm, nToken)) break;
if (nTerm < nToken || memcmp(pToken, pTerm, nToken)) break;

Need your help to recheck the FTS5 prefix search logic, thank you very much.
xiaojin zheng

获取 Outlook for Android<>

sqlite-users mailing list
[hidden email]