Newbie Issues with COLLATE

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Newbie Issues with COLLATE

Jim and Carol Ingram
(System environment: Windows 10 using sqlite-tools-win32-x86-3290000.zip,
running SQLite3.exe as administrator)

 

I'm a newbie to SQLite but not to SQL.  Apologies in advance if this has
been covered in the Archives and I just haven't found it after two days of
searching.

 

I'm trying to use SQLite3.exe (the command-line executable) to do some data
mining of a database generated by a commercial software product that
incorporates the .dll version (I can see sqlite3.dll in its Program Files
folder).  That commercial software has a menu option to make a backup copy
of the database, and that's what I'm attempting to mine.  I'm only using
SELECT against it.

 

Many of my attempted queries, especially those with JOINs on TEXT columns
fail with an "Error: no such collation sequence: LOCALE_NOCASE" message.  A
specific example:

 

sqlite> select k.Keyword, count(ka.ImageID) from Keywords k

   ...> inner join KeywordAssignments ka on ka.KeywordID = k.KeywordID

   ...> group by k.Keyword;

Error: no such collation sequence: LOCALE_NOCASE

 

Using .schema to expose the structure of the Keywords table and the
KeywordAssignments table shows:

 

sqlite> .schema Keywords

CREATE TABLE IF NOT EXISTS 'Keywords'(KeywordID INTEGER PRIMARY KEY, Keyword
TEXT NOT NULL UNIQUE COLLATE LOCALE_NOCASE);

sqlite> .schema KeywordAssignments

CREATE TABLE IF NOT EXISTS 'KeywordAssignments'(ImageID NUMERIC NOT NULL,
KeywordID NUMERIC NOT NULL, PRIMARY KEY(ImageID, KeyWordID));

 

The "LOCALE_NOCASE" error message is obviously caused by the Keyword column
in the Keywords table that is declared as COLLATE LOCALE_NOCASE.  Searching
the SQLite and third-party websites for "LOCALE_NOCASE" suggests that this
particular collation is not built-in to SQLite, but may have been developed
by the commercial software programmers.  Since I am accessing the database
from the SQLite3 command line executable (and not the software's .dll), I
seem to have no access to collations that begin with "LOCALE_".  QUESTION
ONE:  Are these "LOCALE_..." collations custom collations developed by the
software programmers, or are they something I just haven't discovered yet in
the SQLite documentation?

 

I've developed a workaround.  Starting with the Keywords table schema, I
CREATE TABLE a "myKeywords" table with the same columns but omitting the
COLLATE LOCALE_NOCASE declaration.  Then I INSERT INTO the new "myKeywords"
table all of the data from the original Keywords table.  My original query
then runs without error against the new "myKeywords" table.  But this is a
pretty onerous approach.

 

I have noted by trial and error that modifying the original query's last
line to include COLLATE NOCASE ("group by k.Keyword COLLATE NOCASE;") makes
the query work against the original Keywords table without error.  I suspect
this is the correct way to address the error message rather than my
admittedly messy and time-consuming workaround.  QUESTION TWO:  Can anyone
verify that this is indeed the acceptable way to address the error message,
or provide a more acceptable way to do it?

 

And finally, QUESTION THREE: Can anyone point me to a thorough and detailed
tutorial of the whole SQLite COLLATE subject, including creation of custom
collations and loading them into the SQLite command-line executable if
possible?  The documentation seems really thin in this area!

 

TIA for any suggestions anyone can provide.

Jim

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Newbie Issues with COLLATE

Richard Hipp-3
On 10/1/19, Jim and Carol Ingram <[hidden email]> wrote:
> QUESTION
> ONE:  Are these "LOCALE_..." collations custom collations developed by the
> software programmers,

Yes.  The application is using the sqlite3_create_collation() API
(https://www.sqlite.org/c3ref/create_collation.html) to create a new
collating sequence that they are calling "LOCALE_NOCASE".  We can
guess at what that collating sequence does from its name, but without
seeing the code, we don't really know.

>
> I have noted by trial and error that modifying the original query's last
> line to include COLLATE NOCASE ("group by k.Keyword COLLATE NOCASE;") makes
> the query work against the original Keywords table without error.  I suspect
> this is the correct way to address the error message rather than my
> admittedly messy and time-consuming workaround.  QUESTION TWO:  Can anyone
> verify that this is indeed the acceptable way to address the error message,
> or provide a more acceptable way to do it?

That seems like a reasonable approach to me!

Another thing to consider, depending on how much data there is, is to
run the ".dump" command to convert the whole database into a big pile
of SQL.  Then edit the SQL to change LOCALE_NOCASE into just NOCASE,
and reimport it into a new SQLite database.  Then all of your queries
will work correctly.

Or, you could set "PRAGMA writable_schema=ON" and then do an UPDATE
statement on the sqlite_master table to actually change the text of
the CREATE TABLE statement:

    PRAGMA writable_schema=on;
    UPDATE sqlite_master SET sql=replace(sql,'LOCALE_NOCASE','NOCASE')
      WHERE name LIKE 'keywords';

Then exit the command-line tool and reopen and type "REINDEX".  Then
you should be good to go.  Warning:  Make a backup copy first, as
things might to wrong.  In particular, the change from LOCALE_NOCASE
to just NOCASE might possibly cause the UNIQUE constraint to start
failing in one or more cases.  (Unlikely, but possible.)  So be
prepared to work around such difficulties.

> And finally, QUESTION THREE: Can anyone point me to a thorough and detailed
> tutorial of the whole SQLite COLLATE subject, including creation of custom
> collations and loading them into the SQLite command-line executable if
> possible?  The documentation seems really thin in this area!
>

The https://www.sqlite.org/c3ref/create_collation.html document is
about all we have.  There might be more information in some of the
books about SQLite.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Newbie Issues with COLLATE

Keith Medcalf
In reply to this post by Jim and Carol Ingram

On Tuesday, 1 October, 2019 11:22, Jim and Carol Ingram <[hidden email]> wrote:

>I'm trying to use SQLite3.exe (the command-line executable) to do some
>data mining of a database generated by a commercial software product that
>incorporates the .dll version (I can see sqlite3.dll in its Program Files
>folder).  That commercial software has a menu option to make a backup
>copy of the database, and that's what I'm attempting to mine.  I'm only using
>SELECT against it.

>Many of my attempted queries, especially those with JOINs on TEXT columns
>fail with an "Error: no such collation sequence: LOCALE_NOCASE" message.

This means that the column is using a collation sequence named LOCALE_NOCASE and that you do not have that collation sequence loaded, ie, that "there is no such collation sequence".

>The "LOCALE_NOCASE" error message is obviously caused by the Keyword
>column in the Keywords table that is declared as COLLATE LOCALE_NOCASE.

Yes.  The column is also indexed.  The index is also unuseable without the collation sequence.

>Searching the SQLite and third-party websites for "LOCALE_NOCASE" suggests that
>this particular collation is not built-in to SQLite, but may have been
>developed by the commercial software programmers.  Since I am accessing the
>database from the SQLite3 command line executable (and not the software's .dll), I
>seem to have no access to collations that begin with "LOCALE_".  

>QUESTION ONE:  Are these "LOCALE_..." collations custom collations developed by
>the software programmers, or are they something I just haven't discovered yet
>in the SQLite documentation?

They are custom collation names.  Whether or not it is a custom collation that is any different from the built-in NOCASE collation can only be answered by the programmer who used that collation name.

>I've developed a workaround.  Starting with the Keywords table schema, I
>CREATE TABLE a "myKeywords" table with the same columns but omitting the
>COLLATE LOCALE_NOCASE declaration.  Then I INSERT INTO the new
>"myKeywords" table all of the data from the original Keywords table.  
>My original query then runs without error against the new "myKeywords" table.  
>But this is a pretty onerous approach.

>I have noted by trial and error that modifying the original query's last
>line to include COLLATE NOCASE ("group by k.Keyword COLLATE NOCASE;")
>makes the query work against the original Keywords table without error.  I
>suspect this is the correct way to address the error message rather than my
>admittedly messy and time-consuming workaround.  

Yes, this will work.  You are overriding the collation sequence defined for the column in the table definition and using the builtin NOCASE (ASCII only) case insensitive collation.  This means that the index on k.keyword will not be used since it is the wrong collation.

>QUESTION TWO:  Can anyone verify that this is indeed the acceptable way to address the error
>message, or provide a more acceptable way to do it?

It will work however the indexes created using the original collation will be unuseable.

>And finally, QUESTION THREE: Can anyone point me to a thorough and
>detailed tutorial of the whole SQLite COLLATE subject, including creation of
>custom collations and loading them into the SQLite command-line executable if
>possible?  The documentation seems really thin in this area!

https://sqlite.org/c3ref/create_collation.html
https://sqlite.org/datatype3.html#collation

A collation is simply a sorting function.  It receives two character strings A and B and returns their sort order.  If A < B then -1 is returned.  If A == B then 0 is returned.  If A > B then 1 is returned.  The default collation is BINARY which is nothing more than strcmp(A,B) (ie, sort in binary order).

Since this is on Windows I would suspect that LOCALE_NOCASE is simply a collation which uses the builtin Win32 CompareStringW function (with some combination of flags) to allow handling of characters other than just ASII.

>TIA for any suggestions anyone can provide.

There are a couple of simple options that I can see:

(1) dump the database, modify the dump to get rid of the custom collation (perhaps use the builtin NOCASE collation, perhaps the data is not multilingual but is plain ASCII), and then reload the .dump file into a new database, and use that (the new database will be using an ASCII-only NOCASE collation, so if you have non-ASCII characters this may produce different results than your application)

sqlite3 broken_database.db .dump > dumpfile.sql
edit dumpfile.sql and change LOCALE_NOCASE to NOCASE
sqlite3 newdatabase.db < dumpfile.sql

(2) look in the application directories for another DLL that implements the collation sequence, and load it into your command shell using .load <dllname>

(3) use the version of the command shell that loads sqlite3.dll rather than having it statically linked into the executable.  This probably requires the most fiddling and actually getting the collation loaded depends on if it has been builtin to a custom sqlite3.dll or part of the application, or is somewhere else (like in another DLL).

Generally (2) is the most likely to work if you can find the extension that the application loads in the application directories.  Otherwise, you will have to dump/modify/reload the database to get rid of the custom collation.  Alternatively, you could update the database schema (sqlite_master) table and change the collation name, then re-index the database (because the indexes will contain data from the old collation, not the one you changed it to).

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Newbie Issues with COLLATE

Jim and Carol Ingram
Thanks, Richard and Keith for your very helpful information.  I suspected that programmers could "hide" custom collations away from prying eyes using the API.  I've scanned all files in the software's Program folder using a content search for those collation names (Super Finder XT) and found nothing, neither in the sqlite3.dll nor any other files.  Apparently once compiled they become undetectable.  Since nothing in the database seems to require them once the error message is dealt with, this is a non-problem.

I'm going to stick with my method of simply overriding the collation with a built-in one in my scripts.  Since the content of the database changes almost daily, the other methods you've so kindly mentioned seem to this novice like they would need to be done with every new backup copy of the real database (to which I have no access), and so would be as burdensome as my original workaround.  Just putting the override in the script is a one-time change, no matter how many times I export a backup from the software to work on.  Feel free to correct me if I'm wrong.  Still, your suggestions were part of my SQLite learning process and are greatly appreciated.

Cautions about the effects of overriding collations on key columns are duly noted, thanks for explaining that, Keith.

Finally, thanks to both of you for pointing me to some additional documentation.

I consider this thread resolved, but as a newbie to the mailing list I don't know if I need to do something to close it...

Jim

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Newbie Issues with COLLATE

Hick Gunter
In reply to this post by Richard Hipp-3
I'm guessing that LOCALE_NOCASE will probably be causing things that collate distinct in NOCASE to collate equal, so the risk of breaking UNIQUE constraints seems rather small

-----Urspr√ľngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Richard Hipp
Gesendet: Dienstag, 01. Oktober 2019 19:41
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] Re: [sqlite] Newbie Issues with COLLATE

On 10/1/19, Jim and Carol Ingram <[hidden email]> wrote:
> QUESTION
> ONE:  Are these "LOCALE_..." collations custom collations developed by
> the software programmers,

Yes.  The application is using the sqlite3_create_collation() API
(https://www.sqlite.org/c3ref/create_collation.html) to create a new collating sequence that they are calling "LOCALE_NOCASE".  We can guess at what that collating sequence does from its name, but without seeing the code, we don't really know.

>
> I have noted by trial and error that modifying the original query's
> last line to include COLLATE NOCASE ("group by k.Keyword COLLATE
> NOCASE;") makes the query work against the original Keywords table
> without error.  I suspect this is the correct way to address the error
> message rather than my admittedly messy and time-consuming workaround.
> QUESTION TWO:  Can anyone verify that this is indeed the acceptable
> way to address the error message, or provide a more acceptable way to do it?

That seems like a reasonable approach to me!

Another thing to consider, depending on how much data there is, is to run the ".dump" command to convert the whole database into a big pile of SQL.  Then edit the SQL to change LOCALE_NOCASE into just NOCASE, and reimport it into a new SQLite database.  Then all of your queries will work correctly.

Or, you could set "PRAGMA writable_schema=ON" and then do an UPDATE statement on the sqlite_master table to actually change the text of the CREATE TABLE statement:

    PRAGMA writable_schema=on;
    UPDATE sqlite_master SET sql=replace(sql,'LOCALE_NOCASE','NOCASE')
      WHERE name LIKE 'keywords';

Then exit the command-line tool and reopen and type "REINDEX".  Then you should be good to go.  Warning:  Make a backup copy first, as things might to wrong.  In particular, the change from LOCALE_NOCASE to just NOCASE might possibly cause the UNIQUE constraint to start failing in one or more cases.  (Unlikely, but possible.)  So be prepared to work around such difficulties.

> And finally, QUESTION THREE: Can anyone point me to a thorough and
> detailed tutorial of the whole SQLite COLLATE subject, including
> creation of custom collations and loading them into the SQLite
> command-line executable if possible?  The documentation seems really thin in this area!
>

The https://www.sqlite.org/c3ref/create_collation.html document is about all we have.  There might be more information in some of the books about SQLite.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users