I am creating virtual table using fts3 to query tokens:
CREATE VIRTUAL TABLE tok1 USING fts3tokenize(unicode61);
„By default, "unicode61" also removes all diacritics from Latin script characters.”;
When I use query to select tokens:
SELECT token FROM tok1 WHERE input='ęóąłżźćńĘÓĄŁŻŹĆŃlŁ*';
The result is:
It seems diacritics from letter „ł” and „Ł” was not removed. Is it a sqlite bug?
Re: FTS3 tokenize unicode61 does not remove diacritics correctly?
On 2017-02-16 10:53, [hidden email] wrote:
> The result is:
> It seems diacritics from letter „ł” and „Ł” was not removed. Is it a sqlite bug?
In general, overlays (slash, crossbars, etc.) are considered as
diacritics, however, Unicode does not provide a decomposition mapping
for ``ł'', or ``Ł''. Even if it is a bug, then it will concern the
Unicode standard rather then SQLite FTS3 itself, as the latter is using
the character database provided by the Unicode standard.