Need some tips on using FTS5 with SQLite

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Need some tips on using FTS5 with SQLite

John Found

I am using FTS5 for pretty complex search in my application, but recently, trying to make it even more complex I faced some problems that are more general than only FTS5.

I have a forum engine where are several tables for the threads, for the posts, for the users etc. At first I want to be able to search in the posts text, but moreover, this search have to be limiter to some subset of the posts, for example in the posts of a particular thread or posts of some user. Also, there are cases where free-text search is not actually necessary, for example when I am searching for all posts from a particular user.

At first, I tried to create a FTS5 table, containing only the text data that need to be searched and then to access it by queries of the type:

    select
      some,
      fields
    from
      fts
      left join posts p on p.id = fts.rowid
      left join threads t on t.id = p.threadid
      left join users u on u.id = p.userid
    where
      fts match ?1 and u.nick = ?2 and t.id = ?3
    order by ORDER

Such queries are pretty fast when there is only fts match directive in the where clause.
But any additional condition added ruins the performance, especially if the fts match returns big amount of matches.

Additional problem is the order by clause. If the ORDER BY term is "rank" everything works great, but changing it to
other field (for example the post time in order to get first most recent posts) causes huge slow down of the query.

My second attempt was to sacrifice space for speed and to put all searchable data in the fts table - post text, the thread titles and the usernames. This way, building complex fts queries kind of:

   (content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4

I can leave only the fts query in the WHERE clause. This way, the search is pretty fast, but the huge problem remains
the ORDER BY clause. Again everything works fine with "rank", but attempts to use any other field for sorting, causes
huge probems: slow downs up to tens of seconds (usual search time is few milliseconds) and out of memory errors.

Such problems with this second approach are even more serious than on the first approach. i.e. with the second approach everything works fine and quick with "rank" order by, and very, very slow and with errors, on any other "order by" option.

So, he main question follows:

What is the right way to design such complex search systems, based on FTS? How to properly approach the sorting of the search results in order to not have so big slowdowns and out of memory errors.

Any tips are highly welcome!

Regards
--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Need some tips on using FTS5 with SQLite

wmertens
I too am interested in this answer, I still have to start using fts5.

What would be interesting is to see the `EXPLAIN QUERY PLAN [query]` for
each of your queries, so as to see what causes the slowness.

On Thu, Feb 8, 2018, 7:14 PM John Found, <[hidden email]> wrote:

>
> I am using FTS5 for pretty complex search in my application, but recently,
> trying to make it even more complex I faced some problems that are more
> general than only FTS5.
>
> I have a forum engine where are several tables for the threads, for the
> posts, for the users etc. At first I want to be able to search in the posts
> text, but moreover, this search have to be limiter to some subset of the
> posts, for example in the posts of a particular thread or posts of some
> user. Also, there are cases where free-text search is not actually
> necessary, for example when I am searching for all posts from a particular
> user.
>
> At first, I tried to create a FTS5 table, containing only the text data
> that need to be searched and then to access it by queries of the type:
>
>     select
>       some,
>       fields
>     from
>       fts
>       left join posts p on p.id = fts.rowid
>       left join threads t on t.id = p.threadid
>       left join users u on u.id = p.userid
>     where
>       fts match ?1 and u.nick = ?2 and t.id = ?3
>     order by ORDER
>
> Such queries are pretty fast when there is only fts match directive in the
> where clause.
> But any additional condition added ruins the performance, especially if
> the fts match returns big amount of matches.
>
> Additional problem is the order by clause. If the ORDER BY term is "rank"
> everything works great, but changing it to
> other field (for example the post time in order to get first most recent
> posts) causes huge slow down of the query.
>
> My second attempt was to sacrifice space for speed and to put all
> searchable data in the fts table - post text, the thread titles and the
> usernames. This way, building complex fts queries kind of:
>
>    (content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4
>
> I can leave only the fts query in the WHERE clause. This way, the search
> is pretty fast, but the huge problem remains
> the ORDER BY clause. Again everything works fine with "rank", but attempts
> to use any other field for sorting, causes
> huge probems: slow downs up to tens of seconds (usual search time is few
> milliseconds) and out of memory errors.
>
> Such problems with this second approach are even more serious than on the
> first approach. i.e. with the second approach everything works fine and
> quick with "rank" order by, and very, very slow and with errors, on any
> other "order by" option.
>
> So, he main question follows:
>
> What is the right way to design such complex search systems, based on FTS?
> How to properly approach the sorting of the search results in order to not
> have so big slowdowns and out of memory errors.
>
> Any tips are highly welcome!
>
> Regards
> --
> http://fresh.flatassembler.net
> http://asm32.info
> John Found <[hidden email]>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Need some tips on using FTS5 with SQLite

John Found
On Wed, 14 Feb 2018 14:26:21 +0000
Wout Mertens <[hidden email]> wrote:

> I too am interested in this answer, I still have to start using fts5.
>
> What would be interesting is to see the `EXPLAIN QUERY PLAN [query]` for
> each of your queries, so as to see what causes the slowness.
>

It is clear what causes the slowness. For example here is one query:

select
  PostFTS.rowid,
  PostFTS.user as UserName,
  P.userID,
  U.av_time as AVer,
  PostFTS.slug,
  strftime('%d.%m.%Y %H:%M:%S', P.postTime, 'unixepoch') as PostTime,
  P.ReadCount,
  snippet(PostFTS, 0, '*', '*', '...', 16) as Content,
  PostFTS.Caption,
  (select count() from UnreadPosts UP where UP.UserID = 2 and UP.PostID = PostFTS.rowid) as Unread
from
  PostFTS
  left join Posts P on P.id = PostFTS.rowid
  left join Users U on U.id = P.userID
where
  PostFTS match "user: s*"
order by SOME_CLAUSE
limit 20;

1. With SOME_CLAUSE=rank, the execution time is between 28ms and 40ms
2. With SOME_CLAUSE=P.PostTime, the execution time is approximately 500ms!
3. Without "order by" clause at all, the execution time is 1.1ms.

The respective EXPLAIN QUERY PLAN:

1. order by rank (28..40ms)
0 0 0 SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327713:
0 1 1 SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY (rowid=?)
0 2 2 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

2. order by P.PostTime (500ms)
0 0 0 SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327681:
0 1 1 SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY (rowid=?)
0 2 2 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)
0 0 0 USE TEMP B-TREE FOR ORDER BY

3. without order by: (1.1ms)
0 0 0 SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327681:
0 1 1 SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY (rowid=?)
0 2 2 SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY (rowid=?)

Obviously, the slow down is because of the "USE TEMP B-TREE FOR ORDER BY". Order by any field
other than "rank" and "rowid" makes this query very slow.


> On Thu, Feb 8, 2018, 7:14 PM John Found, <[hidden email]> wrote:
>
> >
> > I am using FTS5 for pretty complex search in my application, but recently,
> > trying to make it even more complex I faced some problems that are more
> > general than only FTS5.
> >
> > I have a forum engine where are several tables for the threads, for the
> > posts, for the users etc. At first I want to be able to search in the posts
> > text, but moreover, this search have to be limiter to some subset of the
> > posts, for example in the posts of a particular thread or posts of some
> > user. Also, there are cases where free-text search is not actually
> > necessary, for example when I am searching for all posts from a particular
> > user.
> >
> > At first, I tried to create a FTS5 table, containing only the text data
> > that need to be searched and then to access it by queries of the type:
> >
> >     select
> >       some,
> >       fields
> >     from
> >       fts
> >       left join posts p on p.id = fts.rowid
> >       left join threads t on t.id = p.threadid
> >       left join users u on u.id = p.userid
> >     where
> >       fts match ?1 and u.nick = ?2 and t.id = ?3
> >     order by ORDER
> >
> > Such queries are pretty fast when there is only fts match directive in the
> > where clause.
> > But any additional condition added ruins the performance, especially if
> > the fts match returns big amount of matches.
> >
> > Additional problem is the order by clause. If the ORDER BY term is "rank"
> > everything works great, but changing it to
> > other field (for example the post time in order to get first most recent
> > posts) causes huge slow down of the query.
> >
> > My second attempt was to sacrifice space for speed and to put all
> > searchable data in the fts table - post text, the thread titles and the
> > usernames. This way, building complex fts queries kind of:
> >
> >    (content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4
> >
> > I can leave only the fts query in the WHERE clause. This way, the search
> > is pretty fast, but the huge problem remains
> > the ORDER BY clause. Again everything works fine with "rank", but attempts
> > to use any other field for sorting, causes
> > huge probems: slow downs up to tens of seconds (usual search time is few
> > milliseconds) and out of memory errors.
> >
> > Such problems with this second approach are even more serious than on the
> > first approach. i.e. with the second approach everything works fine and
> > quick with "rank" order by, and very, very slow and with errors, on any
> > other "order by" option.
> >
> > So, he main question follows:
> >
> > What is the right way to design such complex search systems, based on FTS?
> > How to properly approach the sorting of the search results in order to not
> > have so big slowdowns and out of memory errors.
> >
> > Any tips are highly welcome!
> >
> > Regards
> > --

--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users