How to search for fields with accents in UTF-8 data?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to search for fields with accents in UTF-8 data?

Winfried
Hello

I imported a CSV file where data are encoded in UTF-8.

Some of the characters (like Î) are not available in the ASCII table, so
I can't use the CLI sqlite3.exe to search.

As an alternative, I tried SQLite Studio, but it fails:

;Returns no record
SELECT COUNT(*) FROM MyTable WHERE REGION="Île-de-France";

;Returns the expected records
SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

I found nothing in SQLite Studio's menus that could be related to
encoding so that I could tell it the DB contains UTF-8 instead of ANSI.

Is there another Windows application I could try that is more likely to
work with UTF-8 data?

Thank you.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Clemens Ladisch
CC wrote:
> I imported a CSV file where data are encoded in UTF-8.
>
> Some of the characters (like Î) are not available in the ASCII table, so I can't use the CLI sqlite3.exe to search.

The latest version of sqlite3.exe might work.

Anyway, to check that whatever tool you're using uses Unicode correctly, execute this:

  SELECT char(206), unicode('Î');

This should output Î and 206.

> ;Returns the expected records
> SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

Does "SELECT unicode(Libreg) FROM MyTable WHERE Libreg LIKE '%le-de-France' LIMIT 1"
return the correct code?

> I found nothing in SQLite Studio's menus that could be related to encoding so that I could tell it the DB contains UTF-8 instead of ANSI.

The database API always uses Unicode.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Héctor Fiandor
In reply to this post by Winfried
Dear CC:
I use Spanish language, with letters with accents.

When I try to import from a .csv the data obtained in the table was "rare" and I have to implement some routines to convert  "What the program read to the letter I want to write", and I solve the problem. I think in a prehistory way.

If you want, I can send to you this routines, and maybe you can suggest me a better way.

Yours,
Hfiandor.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

KlaasV
In reply to this post by Winfried
You can even make UTF-8 the default encoding in Windows as it is in SQLite

https://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7

CC <[hidden email]> wrote on Sun, 18 Jun 2017 12:52:33 +0200:

>As an alternative, I tried SQLite Studio, but it fails:

;>Returns no record
>SELECT COUNT(*) FROM MyTable WHERE REGION="Île-de-France";

;>Returns the expected records
>SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

>I found nothing in SQLite Studio's menus that could be related to
>encoding so that I could tell it the DB contains UTF-8 instead of ANSI.

>Is there another Windows application I could try that is more likely to
>work with UTF-8 data?

Kind regards | Vriendelijke groeten | Cordiali saluti,
Klaas `Z4us` van Buiten V, Experienced Freelance ICT-Guy
https://www.linkedin.com/in/klaas-van-buiten-0325b2102
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Klaas "Z4us" V, MetaDBA at InnocentIsArt.EU
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

KlaasV
In reply to this post by Winfried
For some applications it is, for others not in all cases. For "just" accented characters it should be no problem following these instructions.
General advice: download OpenOffice or similar OpenSource packages. They are completely free and support almost all OS's.

Kind regards | Vriendelijke groeten | Cordiali saluti,
Klaas `Z4us` van Buiten V, Experienced Freelance ICT-Guy
https://www.linkedin.com/in/klaas-van-buiten-0325b2102

--------------------------------------------
On Mon, 19/6/17, Klaas Van B. <[hidden email]> wrote:

 Subject: Re: How to search for fields with accents in UTF-8 data?
 To: "SQLite Maillist" <[hidden email]>
 Cc: "CC" <[hidden email]>
 Date: Monday, 19 June, 2017, 8:41
 
 You can even make UTF-8 the default encoding
 in Windows as it is in SQLite
 
 https://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7
 
 CC <[hidden email]>
 wrote on Sun, 18 Jun 2017 12:52:33 +0200:
 
 >As an alternative, I tried SQLite
 Studio, but it fails:
 
 ;>Returns no record
 >SELECT COUNT(*) FROM MyTable WHERE
 REGION="Île-de-France";
 
 ;>Returns the expected records
 >SELECT COUNT(*) FROM MyTable WHERE
 "LIBREG" LIKE "%le-de-France";
 
 >I found nothing in SQLite Studio's
 menus that could be related to
 >encoding so that I could tell it
 the DB contains UTF-8 instead of ANSI.
 
 >Is there another Windows
 application I could try that is more likely to
 >work with UTF-8 data?
 
 Kind regards | Vriendelijke groeten |
 Cordiali saluti,
 Klaas `Z4us` van Buiten V, Experienced
 Freelance ICT-Guy
 https://www.linkedin.com/in/klaas-van-buiten-0325b2102
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Klaas "Z4us" V, MetaDBA at InnocentIsArt.EU
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Hick Gunter
In reply to this post by Winfried
What do the following statements return, when run in sqlite3.exe (Please note that single quotes are SQLite3 string delimiters):

SELECT hex('Île-de-France');

SELECT hex(region) FROM MyTable WHERE LIBREG like '%le-de-France' LIMIT 1;

I expect one of them is ISO (lead character > 7F) and the other UTF8 (2 character sequence), so they can never match.
Alternatively, I have also seen "double conversion" ISO -> UTF8 when the encoding was already UTF8 but the conversion ISO -> UTF8 was performed anyway.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von CC
Gesendet: Sonntag, 18. Juni 2017 12:53
An: [hidden email]
Betreff: [sqlite] How to search for fields with accents in UTF-8 data?

Hello

I imported a CSV file where data are encoded in UTF-8.

Some of the characters (like Î) are not available in the ASCII table, so I can't use the CLI sqlite3.exe to search.

As an alternative, I tried SQLite Studio, but it fails:

;Returns no record
SELECT COUNT(*) FROM MyTable WHERE REGION="Île-de-France";

;Returns the expected records
SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

I found nothing in SQLite Studio's menus that could be related to encoding so that I could tell it the DB contains UTF-8 instead of ANSI.

Is there another Windows application I could try that is more likely to work with UTF-8 data?

Thank you.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Winfried
This post has NOT been accepted by the mailing list yet.
Thanks everyon.

It looks running sqlite3.exe in a terminal window (CMD) in Windows 7 doesn't work: Apparently, it doesn't support UTF-8.

And when using DB Browser for SQLite, it does work only if I copy/paste the output with the "?" where an accented character lives:
https://s15.postimg.org/e05v2q09n/SQLite.UTF8.accents.query.DB.Browser.for.SQLite.png

Could the problem be with fonts not supporting UTF8?

I'd rather not mess with Windows encoding, especially since one of the answers in SuperUser says that Windows only partially support Unicode.

Here's the output of the commands:

sqlite> SELECT unicode(Libreg) FROM MyTable WHERE Libreg LIKE '%le-de-France' LIMIT 1;
65533
sqlite> SELECT char(206), unicode('I');
I;73
sqlite> SELECT hex('Ile-de-France');
496C652D64652D4672616E6365
sqlite> SELECT hex(region) FROM MyTable WHERE LIBREG like '%le-de-France' LIMIT 1;
Error: no such column: region
sqlite> SELECT hex(libreg) FROM MyTable WHERE LIBREG like '%le-de-France' LIMIT 1;
CE6C652D64652D4672616E6365
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Winfried
In reply to this post by KlaasV
Thanks everyone.

It looks running sqlite3.exe in a terminal window (CMD) in Windows 7
doesn't work: Apparently, it doesn't support UTF-8.

And when using DB Browser for SQLite, it does work only if I copy/paste
the output with the "?" where an accented character lives:
https://s15.postimg.org/e05v2q09n/SQLite.UTF8.accents.query.DB.Browser.for.SQLite.png

Could the problem be with fonts not supporting UTF8?

I'd rather not mess with Windows encoding, especially since one of the
answers in SuperUser says that Windows only partially support Unicode.

Here's the output of the commands:

sqlite> SELECT unicode(Libreg) FROM MyTable WHERE Libreg LIKE
'%le-de-France' LIMIT 1;
65533
sqlite> SELECT char(206), unicode('I');
I;73
sqlite> SELECT hex('Ile-de-France');
496C652D64652D4672616E6365
sqlite> SELECT hex(region) FROM MyTable WHERE LIBREG like
'%le-de-France' LIMIT 1;
Error: no such column: region
sqlite> SELECT hex(libreg) FROM MyTable WHERE LIBREG like
'%le-de-France' LIMIT 1;
CE6C652D64652D4672616E6365

PS: I might be breaking the thread in the mailing list. For some reason,
the SQLite mailing lists refuses my post from Nabble although I used the
same email address to register 1) with Nabble and 2) with the SQLite
mailing list http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Simon Slavin-3


On 19 Jun 2017, at 11:13am, Gilles <[hidden email]> wrote:

> It looks running sqlite3.exe in a terminal window (CMD) in Windows 7 doesn't work: Apparently, it doesn't support UTF-8.

Correct.  And the "it" that doesn’t support UTF-8 is the Windows console.  SQLite works fine and handles everything as Unicode internally.  The Windows console won’t process multibyte characters internally and can’t display them correctly.

<https://msdn.microsoft.com/en-us/library/windows/desktop/dd317752(v=vs.85).aspx>

"many legacy applications continue to use character sets based on code pages. Even new applications sometimes have to work with code pages, often for one of the following reasons: […]
        • To communicate with the Windows Console, which does not support Unicode."


Some people have found ways to hack around this, but they simulate compliance for a certain codepage rather than implement UTF-8 globally.

<https://www.curlybrace.com/words/2014/10/03/windows-console-and-doublemulti-byte-character-set/>

"The Windows Console doesn’t support Unicode."

<https://social.technet.microsoft.com/Forums/sharepoint/en-US/c42a0300-1803-475d-9438-d39e6672cc69/unicode-characters-in-powershell>

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Winfried
In reply to this post by KlaasV
Found the problem: Turns out the CSV file isn't in UTF8 but in CP1252 :-/

Icon.exe can be used to convert a file before importing it in SQLite.
https://dbaportal.eu/2012/10/24/iconv-for-windows/

Thanks everyone for the help.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Hick Gunter
CP1252 = Windows-1252 = ISO 8859-1 aka Latin-1, an extension of ASCII

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Gilles
Gesendet: Montag, 19. Juni 2017 16:23
An: SQLite Maillist <[hidden email]>
Betreff: Re: [sqlite] How to search for fields with accents in UTF-8 data?

Found the problem: Turns out the CSV file isn't in UTF8 but in CP1252 :-/

Icon.exe can be used to convert a file before importing it in SQLite.
https://dbaportal.eu/2012/10/24/iconv-for-windows/

Thanks everyone for the help.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Olivier Mascia
In reply to this post by Simon Slavin-3
> Le 19 juin 2017 à 15:20, Simon Slavin <[hidden email]> a écrit :
>
> On 19 Jun 2017, at 11:13am, Gilles <[hidden email]> wrote:
>
>> It looks running sqlite3.exe in a terminal window (CMD) in Windows 7 doesn't work: Apparently, it doesn't support UTF-8.
>
> Correct.  And the "it" that doesn’t support UTF-8 is the Windows console.  SQLite works fine and handles everything as Unicode internally.  The Windows console won’t process multibyte characters internally and can’t display them correctly.
>
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd317752(v=vs.85).aspx>
>
> "many legacy applications continue to use character sets based on code pages. Even new applications sometimes have to work with code pages, often for one of the following reasons: […]
> • To communicate with the Windows Console, which does not support Unicode."
>
>
> Some people have found ways to hack around this, but they simulate compliance for a certain codepage rather than implement UTF-8 globally.
>
> <https://www.curlybrace.com/words/2014/10/03/windows-console-and-doublemulti-byte-character-set/>
>
> "The Windows Console doesn’t support Unicode."
>
> <https://social.technet.microsoft.com/Forums/sharepoint/en-US/c42a0300-1803-475d-9438-d39e6672cc69/unicode-characters-in-powershell>
>
> Simon.

Switch the console I/O (windows only of course) of sqlite3 shell.c to use WriteConsoleW and ReadConsoleW, and there you go and forget about CHCP, codepages... Learned it the hard way last year, well after at some point I though DBCS would be enough. No.

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

David Raymond
In reply to this post by Winfried
The Windows command prompt and unicode have always not played well with each other. SQLite itself works perfectly with data on disk or in the database, there are just translation and display problems when going to and from the command prompt.

If you write out your query in, say, Notepad++ and save it in UTF-8, then you can do ".read queryFile.txt" from the CLI and be sure that it's reading it ok. (Assuming of course your DB isn't using one of the UTF-16 options) The output may still look weird if it would include accented characters, but anything like count(*) or unicode(something) that return numbers, or anything that's ASCII will always look ok.


foo.txt: Saved in UTF-8

.bail on
.echo on
create table if not exists foo (foo text collate nocase);
insert or ignore into foo values ('Île-de-France');
select * from foo;
select char(206), unicode('Î');
select count(*) from foo where foo = 'Île-de-France';

end foo.txt



D:\Temp>sqlite3
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> .read foo.txt
create table if not exists foo (foo text collate nocase);
Run Time: real 0.001 user 0.000000 sys 0.015600
insert or ignore into foo values ('Île-de-France');
Run Time: real 0.000 user 0.000000 sys 0.000000
select * from foo;
--EQP-- 0,0,0,SCAN TABLE foo
foo
Île-de-France
Run Time: real 0.001 user 0.000000 sys 0.000000
select char(206), unicode('Î');
char(206)|unicode('Î')
Î|206
Run Time: real 0.000 user 0.000000 sys 0.000000
select count(*) from foo where foo = 'Île-de-France';
--EQP-- 0,0,0,SCAN TABLE foo
count(*)
1
Run Time: real 0.000 user 0.000000 sys 0.000000

sqlite>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Winfried [via SQLite]
Thanks for the infos.




______________________________________
If you reply to this email, your message will be added to the discussion below:
http://sqlite.1065341.n5.nabble.com/How-to-search-for-fields-with-accents-in-UTF-8-data-tp96249p96294.html
This email was sent by Winfried (via Nabble)
To receive all replies by email, subscribe to this discussion: http://sqlite.1065341.n5.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_code&node=96249&code=c3FsaXRlLXVzZXJzQG1haWxpbmdsaXN0cy5zcWxpdGUub3JnfDk2MjQ5fC0xNDUwNjI0MDQ5
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Winfried
In reply to this post by Winfried
Posted by David Raymond on Jun 19, 2017; 5:22pm > /The Windows command
prompt and unicode have always not played well with each other. SQLite
itself works perfectly with data on disk or in the database, there are
just translation and display problems when going to and from the command
prompt. /

Thanks much for the infos.

Lessons I learned:

1. In CSV files, double-check how data are encoded

2. Do not use the sqlite3.exe CLI if the data use anything more than the
basic latin alphabet. Instead, use a GUI application (eg. for Windows,
SQLite Studio, SQLitespeed, etc.)

Thank you all.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

R Smith

On 2017/06/20 2:34 PM, Gilles wrote:
>
> Lessons I learned:
>
> 1. In CSV files, double-check how data are encoded
>
> 2. Do not use the sqlite3.exe CLI if the data use anything more than
> the basic latin alphabet. Instead, use a GUI application (eg. for
> Windows, SQLite Studio, SQLitespeed, etc.)

Every lesson is valuable!  Just to be clear - there is nothing wrong
with using the CLI. When pointing it to a file that is correctly encoded
the import must work correctly (if not, it's a bug) - It's just
difficult to enter weird and wonderful Unicode characters outside the
BMP basic Latin plane (the first 127 code-points) via the console, or do
queries using them, all because the Windows console specifically is not
Unicode-enabled.

As an aside - I never understood the reasons for that. I get that
Windows has a less "techy" clientèle than Linux for instance, and that
the backwards compatibility is paramount, and that no console command
ever need fall outside the 7-bit ANSI range of characters... but geez,
how much effort can it be to make it Unicode-friendly? It's not like the
Windows API lacks any Unicode functionality - even Notepad can handle it
masterfully.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

J. King-3
Indeed. Technically-minded Windows users do exist (Hi, Microsoft, I'm right here!), and I have neither the time nor the inclination to learn PowerShell when the Windows terminal is already adequate---with a set of ports of GNU tools, anyway. :)

On June 20, 2017 9:24:12 AM EDT, R Smith <[hidden email]> wrote:

>
>On 2017/06/20 2:34 PM, Gilles wrote:
>>
>> Lessons I learned:
>>
>> 1. In CSV files, double-check how data are encoded
>>
>> 2. Do not use the sqlite3.exe CLI if the data use anything more than
>> the basic latin alphabet. Instead, use a GUI application (eg. for
>> Windows, SQLite Studio, SQLitespeed, etc.)
>
>Every lesson is valuable!  Just to be clear - there is nothing wrong
>with using the CLI. When pointing it to a file that is correctly
>encoded
>the import must work correctly (if not, it's a bug) - It's just
>difficult to enter weird and wonderful Unicode characters outside the
>BMP basic Latin plane (the first 127 code-points) via the console, or
>do
>queries using them, all because the Windows console specifically is not
>
>Unicode-enabled.
>
>As an aside - I never understood the reasons for that. I get that
>Windows has a less "techy" clientèle than Linux for instance, and that
>the backwards compatibility is paramount, and that no console command
>ever need fall outside the 7-bit ANSI range of characters... but geez,
>how much effort can it be to make it Unicode-friendly? It's not like
>the
>Windows API lacks any Unicode functionality - even Notepad can handle
>it
>masterfully.
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Winfried
In reply to this post by Winfried
R Smith >

 >> 2. Do not use the sqlite3.exe CLI if the data use anything more than
 >> the basic latin alphabet. Instead, use a GUI application (eg. for
 >> Windows, SQLite Studio, SQLitespeed, etc.)

 > Every lesson is valuable!  Just to be clear - there is nothing wrong
with using the CLI. When pointing it to a file that is correctly encoded
the import must work correctly (if not, it's a bug) - It's just
difficult to enter weird and wonderful Unicode characters outside the
BMP basic Latin plane (the first 127 code-points) via the console, or do
queries using them, all because the Windows console specifically is not
Unicode-enabled.

Yes, I should have been more precise: Using the CLI for importing data
works fine; It's when typing accented characers that it fails.

 > As an aside - I never understood the reasons for that.

Beats me. Maybe there is some legacy code somewhere deep in Windows'
bowels that explains why the console (cmd.exe) isn't yet Unicode-capable.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Simon Slavin-3
In reply to this post by R Smith


On 20 Jun 2017, at 2:24pm, R Smith <[hidden email]> wrote:

> Every lesson is valuable!  Just to be clear - there is nothing wrong with using the CLI. When pointing it to a file that is correctly encoded the import must work correctly (if not, it's a bug) - It's just difficult to enter weird and wonderful Unicode characters outside the BMP basic Latin plane (the first 127 code-points) via the console, or do queries using them, all because the Windows console specifically is not Unicode-enabled.

To clarify the clarification, you can use the SQLite shell tool just fine as long as you use it to process files, rather than expect characters which are entered through the keyboard or shown on the display to work.  So use ".read" or ".output" or ".once", and then use a non-console program to view the results.  Don’t type your text and view the results on the display.

I don’t know the technical details of how windows does piping within the console.  It may or may not work to use command lines with ">" or "|" in.

> As an aside - I never understood the reasons for that. I get that Windows has a less "techy" clientèle than Linux for instance, and that the backwards compatibility is paramount, and that no console command ever need fall outside the 7-bit ANSI range of characters... but geez, how much effort can it be to make it Unicode-friendly? It's not like the Windows API lacks any Unicode functionality - even Notepad can handle it masterfully.

The console you see is pretty-much the one which was in Windows 3.1.  It does not use the modern API written post-unicode, it calls the old single-character Windows routines which are still in Windows so old programs don’t suddenly stop working.  It has numerous parts which assume

one keypress == one character == one octet == one space on the display

these assumptions are not only in the code for the console itself but in the Windows routines it calls to do the work.  Rewriting the console to use the newer API calls, and also deal with the above assumptions not being true would be so major that Microsoft might as well start again from scratch.  And I’d expect the resulting code to be two or three times the size, and to be slower in execution.

This affects Powershell too, since Powershell runs inside the console.  But Powershell might do piping correctly, or be an improvement on the original shell in some other way.  I don’t use Windows at work so I don’t know.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to search for fields with accents in UTF-8 data?

Winfried
In reply to this post by Winfried
Simon Slavin-3  > To clarify the clarification, you can use the SQLite
shell tool just fine as long as you use it to process files, rather than
expect characters which are entered through the keyboard or shown on the
display to work.  So use ".read" or ".output" or ".once", and then use a
non-console program to view the results.  Don’t type your text and view
the results on the display.

Looks like using an alternative shell solves the problem:

http://conemu.github.io/

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12
Loading...