sqlite3 and Unicode

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

sqlite3 and Unicode

phaworth
When using the .dump command with .output to a filename, what encoding does
sqlite3 for the file? Same as the database encoding?  Is it possible to
change whatever encoding is used?

Similarly, when using the .import command, does sqlite3 assume the input
file is the same encoding as the database?

Thanks,

Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Richard Hipp-3
On 5/4/15, Peter Haworth <[hidden email]> wrote:
> When using the .dump command with .output to a filename, what encoding does
> sqlite3 for the file? Same as the database encoding?  Is it possible to
> change whatever encoding is used?
>
> Similarly, when using the .import command, does sqlite3 assume the input
> file is the same encoding as the database?
>

I think it assumes UTF8 in both cases.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Luuk
On 4-5-2015 20:54, Richard Hipp wrote:

> On 5/4/15, Peter Haworth <[hidden email]> wrote:
>> When using the .dump command with .output to a filename, what encoding does
>> sqlite3 for the file? Same as the database encoding?  Is it possible to
>> change whatever encoding is used?
>>
>> Similarly, when using the .import command, does sqlite3 assume the input
>> file is the same encoding as the database?
>>
>
> I think it assumes UTF8 in both cases.
>

when using '.dump', it does not create a UTF8 file on Windows 7 (sQlite
version 3.8.8.3)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Luuk
On 5-5-2015 09:22, Luuk wrote:

> On 4-5-2015 20:54, Richard Hipp wrote:
>> On 5/4/15, Peter Haworth <[hidden email]> wrote:
>>> When using the .dump command with .output to a filename, what
>>> encoding does
>>> sqlite3 for the file? Same as the database encoding?  Is it possible to
>>> change whatever encoding is used?
>>>
>>> Similarly, when using the .import command, does sqlite3 assume the input
>>> file is the same encoding as the database?
>>>
>>
>> I think it assumes UTF8 in both cases.
>>
>
> when using '.dump', it does not create a UTF8 file on Windows 7 (sQlite
> version 3.8.8.3)

or am i doing something wrong?

on Windows 7:
C:\temp>sqlite3.exe encoding.sqlite
SQLite version 3.8.8.3 2015-02-25 13:29:11
Enter ".help" for usage hints.
sqlite> select * from test;
een
één
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE test (x varchar(20));
INSERT INTO "test" VALUES('een');
INSERT INTO "test" VALUES('één');
COMMIT;
sqlite> select x, length(x), hex(x) from test;
een|3|65656E
één|3|82826E
sqlite> PRAGMA encoding;
UTF-8
sqlite>

The '82' is not a UTF-8 representation of the 'é' (length=3 is correct)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Jean-Christophe Deschamps-3
In reply to this post by Luuk
At 09:22 05/05/2015, you wrote:
´¯¯¯

>On 4-5-2015 20:54, Richard Hipp wrote:
>>On 5/4/15, Peter Haworth <[hidden email]> wrote:
>>>When using the .dump command with .output to a filename, what
>>>encoding does
>>>sqlite3 for the file? Same as the database encoding?  Is it possible to
>>>change whatever encoding is used?
>>>
>>>Similarly, when using the .import command, does sqlite3 assume the input
>>>file is the same encoding as the database?
>>
>>I think it assumes UTF8 in both cases.
>
>when using '.dump', it does not create a UTF8 file on Windows 7
>(sQlite version 3.8.8.3)

It's UTF16-LE data.


--
<mailto:[hidden email]>[hidden email]  

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Clemens Ladisch
In reply to this post by Luuk
Luuk wrote:
> on Windows 7:
> C:\temp>sqlite3.exe encoding.sqlite
> sqlite> select * from test;
> één

The Windows console does not support UTF-8 with the default settings and
the C stdio functions.  Any data you entered in the console is not
encoded correctly.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Kees Nuyt
On Tue, 05 May 2015 10:50:00 +0200, Clemens Ladisch <[hidden email]> wrote:

>Luuk wrote:
>> on Windows 7:
>> C:\temp>sqlite3.exe encoding.sqlite
>> sqlite> select * from test;
>> één
>
>The Windows console does not support UTF-8 with the default settings and
>the C stdio functions.  Any data you entered in the console is not
>encoded correctly.

I totally agree. It depends on the shell what representation is used,
some MS Windows codepage or UTF-8.

The example below is done in bash on MS Windows 7 (using Mobaxterm),
with a (outdated) Windows sqlite3 executable.
The result of .dump contains UTF-8.

> sqlite3 t.sqlite
SQLite version 3.7.13 2012-06-11 02:05:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
sqlite> insert into t1 values (1,'één');
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
INSERT INTO "t1" VALUES(1,'één');
COMMIT;
sqlite> .q

> sqlite3 t.sqlite .dump| grep INSERT | od -t x1
0000000 49 4e 53 45 52 54 20 49 4e 54 4f 20 22 74 31 22
0000020 20 56 41 4c 55 45 53 28 31 2c 27 c3 a9 c3 a9 6e
0000040 27 29 3b 0a
0000044

--
Regards,
Kees Nuyt

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Staffan Tylen
In reply to this post by Jean-Christophe Deschamps-3
>
>
>>> I think it assumes UTF8 in both cases.
>>>
>>
>> when using '.dump', it does not create a UTF8 file on Windows 7 (sQlite
>> version 3.8.8.3)
>>
>
> It's UTF16-LE data.
>
>
>
I must admit that I'm a bit confused here. If I'm not wrong UTF-8 differs
from ascii when the value is higher than '7f'x, but storing data in sqlite
as text with character values beteen 'x80'x and 'ff'x seems to be no
problem. I previously thought that this could only be done in blob format.

create table t (a);
insert into t values(cast(x'ff' as text));
select a,length(a),hex(a) from t;
 |1|FF

My conclusion is that storing single-byte characters of any value is
allowed, is this true?

Trying to find more info in the sqlite doc I found this page:
https://www.sqlite.org/c3ref/open.html

At the bottom it says: *Note to Windows users:* The encoding used for the
filename argument of sqlite3_open() and sqlite3_open_v2() must be UTF-8,
not whatever codepage is currently defined. Filenames containing
international characters must be converted to UTF-8 prior to passing them
into sqlite3_open() or sqlite3_open_v2().

So what does "must be converted" mean? I don't know how sqlite3.exe works
here but if I do

sqlite3  é.db

where 'é' as we've seen is '82'x it happily creates a file with a
non-displayable character in the first position that seems to be 1 byte
long.

Still confused.
Staffan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Clemens Ladisch
Staffan Tylen wrote:

> I must admit that I'm a bit confused here. If I'm not wrong UTF-8 differs
> from ascii when the value is higher than '7f'x, but storing data in sqlite
> as text with character values beteen 'x80'x and 'ff'x seems to be no
> problem. I previously thought that this could only be done in blob format.
>
> create table t (a);
> insert into t values(cast(x'ff' as text));
> select a,length(a),hex(a) from t;
>  |1|FF
>
> My conclusion is that storing single-byte characters of any value is
> allowed, is this true?

SQLite assumes that all strings you give it are encoded in UTF-8, and
does not actually check the encoding.  It gives you the same string
back, so you could, in theory, use a different encoding, as long as you
do not use any database string processing functions.  In practice, I
would not recommend this.

> At the bottom it says: *Note to Windows users:* The encoding used for the
> filename argument of sqlite3_open() and sqlite3_open_v2() must be UTF-8,
> not whatever codepage is currently defined. Filenames containing
> international characters must be converted to UTF-8 prior to passing them
> into sqlite3_open() or sqlite3_open_v2().
>
> So what does "must be converted" mean? I don't know how sqlite3.exe works
> here but if I do
>
> sqlite3  é.db
>
> where 'é' as we've seen is '82'x it happily creates a file with a
> non-displayable character in the first position that seems to be 1 byte
> long.

The shell assumes that its arguments are UTF-8, and gives the filename
unchanged to sqlite3_open*().  When you've entered the filename in the
Windows console with the default settings, it is not encoded in UTF-8.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Luuk
In reply to this post by Kees Nuyt
On 5-5-2015 11:25, Kees Nuyt wrote:

> On Tue, 05 May 2015 10:50:00 +0200, Clemens Ladisch <[hidden email]> wrote:
>
>> Luuk wrote:
>>> on Windows 7:
>>> C:\temp>sqlite3.exe encoding.sqlite
>>> sqlite> select * from test;
>>> één
>>
>> The Windows console does not support UTF-8 with the default settings and
>> the C stdio functions.  Any data you entered in the console is not
>> encoded correctly.
>
> I totally agree. It depends on the shell what representation is used,
> some MS Windows codepage or UTF-8.
>
> The example below is done in bash on MS Windows 7 (using Mobaxterm),
> with a (outdated) Windows sqlite3 executable.
> The result of .dump contains UTF-8.
>
>> sqlite3 t.sqlite
> SQLite version 3.7.13 2012-06-11 02:05:22
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> create table t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
> sqlite> insert into t1 values (1,'één');
> sqlite> .dump
> PRAGMA foreign_keys=OFF;
> BEGIN TRANSACTION;
> CREATE TABLE t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
> INSERT INTO "t1" VALUES(1,'één');
> COMMIT;
> sqlite> .q
>
>> sqlite3 t.sqlite .dump| grep INSERT | od -t x1
> 0000000 49 4e 53 45 52 54 20 49 4e 54 4f 20 22 74 31 22
> 0000020 20 56 41 4c 55 45 53 28 31 2c 27 c3 a9 c3 a9 6e
> 0000040 27 29 3b 0a
> 0000044
>

OK, thanks....
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Luuk
In reply to this post by Clemens Ladisch
On 5-5-2015 12:25, Clemens Ladisch wrote:

> Staffan Tylen wrote:
>> I must admit that I'm a bit confused here. If I'm not wrong UTF-8 differs
>> from ascii when the value is higher than '7f'x, but storing data in sqlite
>> as text with character values beteen 'x80'x and 'ff'x seems to be no
>> problem. I previously thought that this could only be done in blob format.
>>
>> create table t (a);
>> insert into t values(cast(x'ff' as text));
>> select a,length(a),hex(a) from t;
>>   |1|FF
>>
>> My conclusion is that storing single-byte characters of any value is
>> allowed, is this true?
>
> SQLite assumes that all strings you give it are encoded in UTF-8, and
> does not actually check the encoding.  It gives you the same string
> back, so you could, in theory, use a different encoding, as long as you
> do not use any database string processing functions.  In practice, I
> would not recommend this.
>
>> At the bottom it says: *Note to Windows users:* The encoding used for the
>> filename argument of sqlite3_open() and sqlite3_open_v2() must be UTF-8,
>> not whatever codepage is currently defined. Filenames containing
>> international characters must be converted to UTF-8 prior to passing them
>> into sqlite3_open() or sqlite3_open_v2().
>>
>> So what does "must be converted" mean? I don't know how sqlite3.exe works
>> here but if I do
>>
>> sqlite3  é.db
>>
>> where 'é' as we've seen is '82'x it happily creates a file with a
>> non-displayable character in the first position that seems to be 1 byte
>> long.
>
> The shell assumes that its arguments are UTF-8, and gives the filename
> unchanged to sqlite3_open*().  When you've entered the filename in the
> Windows console with the default settings, it is not encoded in UTF-8.
>
>

now it becomes time that windows will do some things with UTF-8....
(sigh) ;-)

/me currently watching
"Your PC will restart serveral times. Sit back and relax"
while installing updates for technical preview to Windows 10


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Windows 10 and UTF8 [Re: sqlite3 and Unicode]

Luuk
On 5-5-2015 12:40, Luuk wrote:

>
> now it becomes time that windows will do some things with UTF-8....
> (sigh) ;-)
>
> /me currently watching
> "Your PC will restart serveral times. Sit back and relax"
> while installing updates for technical preview to Windows 10
>
>



C:\temp>sqlite3.exe
SQLite version 3.8.9 2015-04-08 12:16:33
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select hex('één');
82826E
sqlite> .quit

C:\temp>chcp 65001
Active code page: 65001

C:\temp>sqlite3.exe
SQLite version 3.8.9 2015-04-08 12:16:33
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select hex('één');


C:\temp>ver

Microsoft Windows [Version 10.0.10074]

C:\temp>

YES, IT CRASHED (silently.....)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

phaworth
In reply to this post by phaworth
Thanks for all the replies.  I'm still confused on this.  It sounds like
maybe UTF8 is the answer on OSX but UTF-16LE on Windows.

Whatever the answer to that, it seems there isn't a way to control the
encoding within sqlite3 so I guess I won't worry about it.

Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 and Unicode

Teg-3
Hello Peter,

I use UTF-8 exclusively in the engine parts my windows applications. I
just translate to UTF-16 for the GUI layer. In that way, no matter
what platform I support, the DB's remain the same.

C



Tuesday, May 5, 2015, 3:29:54 PM, you wrote:

PH> Thanks for all the replies.  I'm still confused on this.  It sounds like
PH> maybe UTF8 is the answer on OSX but UTF-16LE on Windows.

PH> Whatever the answer to that, it seems there isn't a way to control the
PH> encoding within sqlite3 so I guess I won't worry about it.

PH> Pete
PH> lcSQL Software <http://www.lcsql.com>
PH> Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
PH> SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>
PH> _______________________________________________
PH> sqlite-users mailing list
PH> [hidden email]
PH> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
 Teg                            mailto:[hidden email]

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users