UTF support

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF support

J Decker
I saw a few things go by about unicode... and understand that it should
just work to store the data as characters...

I'm getting a unrecognized token... and think this page isn't right...
I was playing with greek translation of 'mary had a little lamb'

http://www.sqlite.org/tokenreq.html

-----------
"MySQL allows identifiers to be quoted using the grave accent character.
SQLite supports this for interoperability.

H41160: SQLite shall recognize as an ID token any sequence of characters
that begins with a grave accent (u0060), is followed by zero or more
non-zero characters and/or pairs ofgrave accents (u0060) and terminates
with a grave accent (u0022) that is not part of a pair."

----------


20:57:51.729|[hidden email](472):Result of prepare failed?
unrecognized token: "'Μαίρη είχε " at char 0[replace into option4_values
(`option_id`,`string`,`segment` ) values
('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)] in
[replace into option4_values (`option_id`,`string`,`segment` ) values
('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)]


-----------

The actual data isn't as it looks here, as during the unicode to character
conversion of the select fix the wide characters that don't convert unto
utf-8 representations...

this is memory of the command.  that is being passed to prepare.....
 between the ' ' there is no non zero character that is not a '....

0x00000000029844E8  72 65 70 6c 61 63 65 20 69 6e 74 6f 20 6f 70 74 69 6f
6e 34 5f 76 61 6c 75 65 73 20 28 60 6f 70 74  replace into option4_values
(`opt
0x0000000002984509  69 6f 6e 5f 69 64 60 2c 60 73 74 72 69 6e 67 60 2c 60
73 65 67 6d 65 6e 74 60 20 29 20 76 61 6c 75  ion_id`,`string`,`segment` )
valu
0x000000000298452A  65 73 20 28 27 38 62 33 37 37 61 36 38 2d 34 33 35 38
2d 31 31 65 34 2d 61 63 65 34 2d 33 30 38 35  es
('8b377a68-4358-11e4-ace4-3085
0x000000000298454B  61 39 39 30 33 34 34 39 27 2c 27 e0 8e 9c e0 8e b1 e0
8e af e0 8f 81 e0 8e b7 20 e0 8e b5 e0 8e af  a9903449','àŽœàŽ±àŽ¯à..àŽ·
àŽµàŽ¯
0x000000000298456C  e0 8f 87 e0 8e b5 20 e0 8e ad e0 8e bd e0 8e b1 20 e0
8e bc e0 8e b9 e0 8e ba e0 8f 81 e0 8f 8c 20  à..àŽµ àŽ.àŽ.àŽ±
àŽ.àŽ.àŽºà..à.Œ
0x000000000298458D  e0 8e b1 e0 8f 81 e0 8e bd e0 8e af 27 2c 30 29 00 fe
ca ef be fe ca ef be cd fd fd fd fd ab ab ab
 àŽ±à..àŽ.àŽ¯',0).þÊï.þÊï.Íýýýý«««
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

Richard Hipp-3
On Tue, Oct 7, 2014 at 12:06 AM, J Decker <[hidden email]> wrote:

> I saw a few things go by about unicode... and understand that it should
> just work to store the data as characters...
>
> I'm getting a unrecognized token... and think this page isn't right...
> I was playing with greek translation of 'mary had a little lamb'
>
> http://www.sqlite.org/tokenreq.html
>


Wait.  Stop right there.  Where did you find that page?  That page is many
years obsolete and out of date and is not maintained.  The older banner
across the top should be a clue.

Did you get there via a link?  Can you tell me what the link is so that I
can delete it?



>
> -----------
> "MySQL allows identifiers to be quoted using the grave accent character.
> SQLite supports this for interoperability.
>
> H41160: SQLite shall recognize as an ID token any sequence of characters
> that begins with a grave accent (u0060), is followed by zero or more
> non-zero characters and/or pairs ofgrave accents (u0060) and terminates
> with a grave accent (u0022) that is not part of a pair."
>
> ----------
>
>
> 20:57:51.729|[hidden email](472):Result of prepare failed?
> unrecognized token: "'Μαίρη είχε " at char 0[replace into option4_values
> (`option_id`,`string`,`segment` ) values
> ('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)] in
> [replace into option4_values (`option_id`,`string`,`segment` ) values
> ('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)]
>
>
> -----------
>
> The actual data isn't as it looks here, as during the unicode to character
> conversion of the select fix the wide characters that don't convert unto
> utf-8 representations...
>
> this is memory of the command.  that is being passed to prepare.....
>  between the ' ' there is no non zero character that is not a '....
>
> 0x00000000029844E8  72 65 70 6c 61 63 65 20 69 6e 74 6f 20 6f 70 74 69 6f
> 6e 34 5f 76 61 6c 75 65 73 20 28 60 6f 70 74  replace into option4_values
> (`opt
> 0x0000000002984509  69 6f 6e 5f 69 64 60 2c 60 73 74 72 69 6e 67 60 2c 60
> 73 65 67 6d 65 6e 74 60 20 29 20 76 61 6c 75  ion_id`,`string`,`segment` )
> valu
> 0x000000000298452A  65 73 20 28 27 38 62 33 37 37 61 36 38 2d 34 33 35 38
> 2d 31 31 65 34 2d 61 63 65 34 2d 33 30 38 35  es
> ('8b377a68-4358-11e4-ace4-3085
> 0x000000000298454B  61 39 39 30 33 34 34 39 27 2c 27 e0 8e 9c e0 8e b1 e0
> 8e af e0 8f 81 e0 8e b7 20 e0 8e b5 e0 8e af  a9903449','àŽœàŽ±àŽ¯à..àŽ·
> àŽµàŽ¯
> 0x000000000298456C  e0 8f 87 e0 8e b5 20 e0 8e ad e0 8e bd e0 8e b1 20 e0
> 8e bc e0 8e b9 e0 8e ba e0 8f 81 e0 8f 8c 20  à..àŽµ àŽ.àŽ.àŽ±
> àŽ.àŽ.àŽºà..à.Œ
> 0x000000000298458D  e0 8e b1 e0 8f 81 e0 8e bd e0 8e af 27 2c 30 29 00 fe
> ca ef be fe ca ef be cd fd fd fd fd ab ab ab
>  àŽ±à..àŽ.àŽ¯',0).þÊï.þÊï.Íýýýý«««
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

jose isaias cabrera
In reply to this post by J Decker

"J Decker" wrote...

>I saw a few things go by about unicode... and understand that it should
> just work to store the data as characters...
>
> I'm getting a unrecognized token... and think this page isn't right...
> I was playing with greek translation of 'mary had a little lamb'
>
> http://www.sqlite.org/tokenreq.html
>
> -----------
> "MySQL allows identifiers to be quoted using the grave accent character.
> SQLite supports this for interoperability.
>
> H41160: SQLite shall recognize as an ID token any sequence of characters
> that begins with a grave accent (u0060), is followed by zero or more
> non-zero characters and/or pairs ofgrave accents (u0060) and terminates
> with a grave accent (u0022) that is not part of a pair."
>
> ----------
>
>
> 20:57:51.729|[hidden email](472):Result of prepare failed?
> unrecognized token: "'Μαίρη είχε " at char 0[replace into option4_values
> (`option_id`,`string`,`segment` ) values
> ('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)] in
> [replace into option4_values (`option_id`,`string`,`segment` ) values
> ('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)]
>
>
> -----------
>
> The actual data isn't as it looks here, as during the unicode to character
> conversion of the select fix the wide characters that don't convert unto
> utf-8 representations...
>
> this is memory of the command.  that is being passed to prepare.....
> between the ' ' there is no non zero character that is not a '....
>
> 0x00000000029844E8  72 65 70 6c 61 63 65 20 69 6e 74 6f 20 6f 70 74 69 6f
> 6e 34 5f 76 61 6c 75 65 73 20 28 60 6f 70 74  replace into option4_values
> (`opt
> 0x0000000002984509  69 6f 6e 5f 69 64 60 2c 60 73 74 72 69 6e 67 60 2c 60
> 73 65 67 6d 65 6e 74 60 20 29 20 76 61 6c 75  ion_id`,`string`,`segment` )
> valu
> 0x000000000298452A  65 73 20 28 27 38 62 33 37 37 61 36 38 2d 34 33 35 38
> 2d 31 31 65 34 2d 61 63 65 34 2d 33 30 38 35  es
> ('8b377a68-4358-11e4-ace4-3085
> 0x000000000298454B  61 39 39 30 33 34 34 39 27 2c 27 e0 8e 9c e0 8e b1 e0
> 8e af e0 8f 81 e0 8e b7 20 e0 8e b5 e0 8e af  a9903449','àŽœàŽ±àŽ¯à..àŽ·
> àŽµàŽ¯
> 0x000000000298456C  e0 8f 87 e0 8e b5 20 e0 8e ad e0 8e bd e0 8e b1 20 e0
> 8e bc e0 8e b9 e0 8e ba e0 8f 81 e0 8f 8c 20  à..àŽµ àŽ.àŽ.àŽ±
> àŽ.àŽ.àŽºà..à.Œ
> 0x000000000298458D  e0 8e b1 e0 8f 81 e0 8e bd e0 8e af 27 2c 30 29 00 fe
> ca ef be fe ca ef be cd fd fd fd fd ab ab ab
> àŽ±à..àŽ.àŽ¯',0).þÊï.þÊï.Íýýýý«««

I have been using UTF8 in my programs and able to store and retrieve,
Chinese, Japanese, Thai, Korean, Arabic, Hebrew, Russian, Greek, etc., and I
have never had a problem.  Perhaps it's your Unicode wrapper.  I am using
the d language and it works just fine.

josé

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

Richard Hipp-3
In reply to this post by J Decker
On Tue, Oct 7, 2014 at 12:06 AM, J Decker <[hidden email]> wrote:

> I saw a few things go by about unicode... and understand that it should
> just work to store the data as characters...
>
> I'm getting a unrecognized token... and think this page isn't right...
> I was playing with greek translation of 'mary had a little lamb'
>
>
I ran the following script through the sqlite3 command-line shell and it
works fine:

CREATE TABLE option4_values(option_id, string, segment);
REPLACE INTO option4_values(`option_id`,`string`,`segment`)
 VALUES('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό
αρνί',0);
SELECT * FROM option4_values;

I suggest that the problem is in your programming language, or in the
wrapper that links your programming language to SQLite, not in SQLite
itself.  Can you tell us what programming language and what operating
system you are using?


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

J Decker
In reply to this post by Richard Hipp-3
On Tue, Oct 7, 2014 at 5:02 AM, Richard Hipp <[hidden email]> wrote:

> On Tue, Oct 7, 2014 at 12:06 AM, J Decker <[hidden email]> wrote:
>
> > I saw a few things go by about unicode... and understand that it should
> > just work to store the data as characters...
> >
> > I'm getting a unrecognized token... and think this page isn't right...
> > I was playing with greek translation of 'mary had a little lamb'
> >
> > http://www.sqlite.org/tokenreq.html
> >
>
>
> Wait.  Stop right there.  Where did you find that page?  That page is many
> years obsolete and out of date and is not maintained.  The older banner
> across the top should be a clue.
>
> Did you get there via a link?  Can you tell me what the link is so that I
> can delete it?
>
> I did a google search for 'sqlite unicode unrecognized token replace'  was
the first search result.


https://www.google.com/webhp?sourceid=chrome-instant&rlz=1C1ASUM_enUS603US603&ion=1&espv=2&ie=UTF-8#q=sqlite+unicode+unrecognized+token+replace


>
>
> >
> > -----------
> > "MySQL allows identifiers to be quoted using the grave accent character.
> > SQLite supports this for interoperability.
> >
> > H41160: SQLite shall recognize as an ID token any sequence of characters
> > that begins with a grave accent (u0060), is followed by zero or more
> > non-zero characters and/or pairs ofgrave accents (u0060) and terminates
> > with a grave accent (u0022) that is not part of a pair."
> >
> > ----------
> >
> >
> > 20:57:51.729|[hidden email](472):Result of prepare failed?
> > unrecognized token: "'Μαίρη είχε " at char 0[replace into option4_values
> > (`option_id`,`string`,`segment` ) values
> > ('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)]
> in
> > [replace into option4_values (`option_id`,`string`,`segment` ) values
> > ('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό αρνί',0)]
> >
> >
> > -----------
> >
> > The actual data isn't as it looks here, as during the unicode to
> character
> > conversion of the select fix the wide characters that don't convert unto
> > utf-8 representations...
> >
> > this is memory of the command.  that is being passed to prepare.....
> >  between the ' ' there is no non zero character that is not a '....
> >
> > 0x00000000029844E8  72 65 70 6c 61 63 65 20 69 6e 74 6f 20 6f 70 74 69 6f
> > 6e 34 5f 76 61 6c 75 65 73 20 28 60 6f 70 74  replace into option4_values
> > (`opt
> > 0x0000000002984509  69 6f 6e 5f 69 64 60 2c 60 73 74 72 69 6e 67 60 2c 60
> > 73 65 67 6d 65 6e 74 60 20 29 20 76 61 6c 75  ion_id`,`string`,`segment`
> )
> > valu
> > 0x000000000298452A  65 73 20 28 27 38 62 33 37 37 61 36 38 2d 34 33 35 38
> > 2d 31 31 65 34 2d 61 63 65 34 2d 33 30 38 35  es
> > ('8b377a68-4358-11e4-ace4-3085
> > 0x000000000298454B  61 39 39 30 33 34 34 39 27 2c 27 e0 8e 9c e0 8e b1 e0
> > 8e af e0 8f 81 e0 8e b7 20 e0 8e b5 e0 8e af  a9903449','àŽœàŽ±àŽ¯à..àŽ·
> > àŽµàŽ¯
> > 0x000000000298456C  e0 8f 87 e0 8e b5 20 e0 8e ad e0 8e bd e0 8e b1 20 e0
> > 8e bc e0 8e b9 e0 8e ba e0 8f 81 e0 8f 8c 20  à..àŽµ àŽ.àŽ.àŽ±
> > àŽ.àŽ.àŽºà..à.Œ
> > 0x000000000298458D  e0 8e b1 e0 8f 81 e0 8e bd e0 8e af 27 2c 30 29 00 fe
> > ca ef be fe ca ef be cd fd fd fd fd ab ab ab
> >  àŽ±à..àŽ.àŽ¯',0).þÊï.þÊï.Íýýýý«««
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

Richard Hipp-3
On Tue, Oct 7, 2014 at 8:50 AM, J Decker <[hidden email]> wrote:

> On Tue, Oct 7, 2014 at 5:02 AM, Richard Hipp <[hidden email]> wrote:
>
> > On Tue, Oct 7, 2014 at 12:06 AM, J Decker <[hidden email]> wrote:
> >
> > > I saw a few things go by about unicode... and understand that it should
> > > just work to store the data as characters...
> > >
> > > I'm getting a unrecognized token... and think this page isn't right...
> > > I was playing with greek translation of 'mary had a little lamb'
> > >
> > > http://www.sqlite.org/tokenreq.html
> > >
> >
> >
> > Wait.  Stop right there.  Where did you find that page?  That page is
> many
> > years obsolete and out of date and is not maintained.  The older banner
> > across the top should be a clue.
> >
> > Did you get there via a link?  Can you tell me what the link is so that I
> > can delete it?
> >
> > I did a google search for 'sqlite unicode unrecognized token replace'
> was
> the first search result.
>
>
>
> https://www.google.com/webhp?sourceid=chrome-instant&rlz=1C1ASUM_enUS603US603&ion=1&espv=2&ie=UTF-8#q=sqlite+unicode+unrecognized+token+replace
>
>

Thank you.  I don't know where Google came up with that obsolete page, but
it is gone now and shouldn't give us any more trouble.

See follow-up emails about your original concern.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

J Decker
In reply to this post by Richard Hipp-3
On Tue, Oct 7, 2014 at 5:39 AM, Richard Hipp <[hidden email]> wrote:

> On Tue, Oct 7, 2014 at 12:06 AM, J Decker <[hidden email]> wrote:
>
> > I saw a few things go by about unicode... and understand that it should
> > just work to store the data as characters...
> >
> > I'm getting a unrecognized token... and think this page isn't right...
> > I was playing with greek translation of 'mary had a little lamb'
> >
> >
> I ran the following script through the sqlite3 command-line shell and it
> works fine:
>
> CREATE TABLE option4_values(option_id, string, segment);
> REPLACE INTO option4_values(`option_id`,`string`,`segment`)
>  VALUES('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό
> αρνί',0);
> SELECT * FROM option4_values;
>
> Hmm... wonder what it's getting....


> I suggest that the problem is in your programming language, or in the
> wrapper that links your programming language to SQLite, not in SQLite
> itself.  Can you tell us what programming language and what operating
> system you are using?
>
> C, visual studio 2012 build, windows.
built with UNICODE enabled... instead of multi-byte character set....
it could be my conversion routine... I'm using wcstombs_s  with _MSC_VER
set... before it was just faililng, because wcstombs_s doesn't convert
anything with a high bit set... so I added a handler to replace it with a
utf-8 16 bit character encode (expands to 3 bytes  as described here
http://en.wikipedia.org/wiki/UTF-8#Description  )

if( err == 42 )
{
(*ch++) = 0xE0 | ((unsigned char*)wch)[1] >> 4;
(*ch++) = 0x80 | ( ( ((unsigned char*)wch)[1] & 0xF ) << 2 ) | ( (
((unsigned char*)wch)[0] ) >> 6 );
(*ch++) = 0x80 |  ( ((unsigned char*)wch)[0] & 0x3F );
}

which works... if I mouse-over on char * string it shows the right unicode
characters.
The logging that I included in the first message was converted from
wchar_t* to char* and then the sqlite3_strerror() is expanded from char *
to wchar_t * and still shows the right characters....

 I just cannot identify the unrecognized token... it's obviously not at
character 0... (that's gotten by comparing the pzTail result of
sqlite3_prepare_v2 )...




> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

Teg-3
Hello J,

        string_t        sTest;
        int nLengthNeeded = WideCharToMultiByte(CP_UTF8, 0, pszWide,nLength, 0, 0, 0, 0);
        if( !nLengthNeeded )
        {
                ASSERT(0);
                return(E_ABORT);
        }

        sTest.resize(nLengthNeeded + 16);
        nLength = WideCharToMultiByte(CP_UTF8, 0, pszWide,nLength, reinterpret_cast<char*>(&sTest[0]),(uint32_t)sTest.size(),0, 0);
        sTest[nLength] = 0;
        ASSERT(!strcmp(sTest.c_str(),(char*)(*this)));


Is what I used to use to convert from UTF-16 to UTF-8 in Windows.
There are similar functions for converting in the opposite direction.
Internally my program is 100% UTF8. I do translations to UTF-16 right
at the point I display the strings in Windows.

This code is actually some test code I use today to compare the
conversions I do manually to what Windows generates. In debug mode, it
does two conversions and compares the two.

Tuesday, October 7, 2014, 8:59:07 AM, you wrote:

JD> On Tue, Oct 7, 2014 at 5:39 AM, Richard Hipp <[hidden email]> wrote:

>> On Tue, Oct 7, 2014 at 12:06 AM, J Decker <[hidden email]> wrote:
>>
>> > I saw a few things go by about unicode... and understand that it should
>> > just work to store the data as characters...
>> >
>> > I'm getting a unrecognized token... and think this page isn't right...
>> > I was playing with greek translation of 'mary had a little lamb'
>> >
>> >
>> I ran the following script through the sqlite3 command-line shell and it
>> works fine:
>>
>> CREATE TABLE option4_values(option_id, string, segment);
>> REPLACE INTO option4_values(`option_id`,`string`,`segment`)
>>  VALUES('8b377a68-4358-11e4-ace4-3085a9903449','Μαίρη είχε ένα μικρό
>> αρνί',0);
>> SELECT * FROM option4_values;
>>
>> Hmm... wonder what it's getting....


>> I suggest that the problem is in your programming language, or in the
>> wrapper that links your programming language to SQLite, not in SQLite
>> itself.  Can you tell us what programming language and what operating
>> system you are using?
>>
>> C, visual studio 2012 build, windows.
JD> built with UNICODE enabled... instead of multi-byte character set....
JD> it could be my conversion routine... I'm using wcstombs_s  with _MSC_VER
JD> set... before it was just faililng, because wcstombs_s doesn't convert
JD> anything with a high bit set... so I added a handler to replace it with a
JD> utf-8 16 bit character encode (expands to 3 bytes  as described here
JD> http://en.wikipedia.org/wiki/UTF-8#Description  )

JD> if( err == 42 )
JD> {
JD> (*ch++) = 0xE0 | ((unsigned char*)wch)[1] >> 4;
JD> (*ch++) = 0x80 | ( ( ((unsigned char*)wch)[1] & 0xF ) << 2 ) | ( (
JD> ((unsigned char*)wch)[0] ) >> 6 );
JD> (*ch++) = 0x80 |  ( ((unsigned char*)wch)[0] & 0x3F );
JD> }

JD> which works... if I mouse-over on char * string it shows the right unicode
JD> characters.
JD> The logging that I included in the first message was converted from
JD> wchar_t* to char* and then the sqlite3_strerror() is expanded from char *
JD> to wchar_t * and still shows the right characters....

JD>  I just cannot identify the unrecognized token... it's obviously not at
JD> character 0... (that's gotten by comparing the pzTail result of
JD> sqlite3_prepare_v2 )...




>> --
>> D. Richard Hipp
>> [hidden email]
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
JD> _______________________________________________
JD> sqlite-users mailing list
JD> [hidden email]
JD> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



--
Best regards,
 Teg                            mailto:[hidden email]

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

J Decker
In reply to this post by J Decker
Did find sqlite3_prepare16_v2; this allows the replace to work without
error; but the result from the select to read it back isn't the same as
what I put in...
Still trying to figure out the differences... going to implement
WideCharToMultiByte as conversion too to see what differences are there....
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

J Decker
So, I guess it is technically not allowed to encode 11 bit unicode
characters as 16.
the greek characters are 0x3XX  which is 10 bits... I checked what
WideCharToMultiByte was doing and found it was using 11 bit encodings...
fixed my encoder to use an appropriate size for what's required, added 11
bit decoding, and now in and out works for that and some chinese characters
which are more than 11 bits.

The 'unrecognized token' is 0xE0  ? ... although a thing could be 12 bits
exactly... so is it checking ( char[0] == 0xe0 ) && ( ( char[1] & 0xE0 ) ==
0x80 )?

as a side note.. using visual studio to mouse over the resulting char *
string with 11 bit encodings it shows bad characters, if encoded as (valid
but illegal) 16 bit it browses correctly.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

jose isaias cabrera

"J Decker" wrote...

> So, I guess it is technically not allowed to encode 11 bit unicode
> characters as 16.
> the greek characters are 0x3XX  which is 10 bits... I checked what
> WideCharToMultiByte was doing and found it was using 11 bit encodings...
> fixed my encoder to use an appropriate size for what's required, added 11
> bit decoding, and now in and out works for that and some chinese
> characters
> which are more than 11 bits.
>
> The 'unrecognized token' is 0xE0  ? ... although a thing could be 12 bits
> exactly... so is it checking ( char[0] == 0xe0 ) && ( ( char[1] & 0xE0 )
> ==
> 0x80 )?
>
> as a side note.. using visual studio to mouse over the resulting char *
> string with 11 bit encodings it shows bad characters, if encoded as (valid
> but illegal) 16 bit it browses correctly.

J,

My suggestion is for you to read about ANSI, ASCII, UTF7, UTF8, UTF16 and
UTF32 and understand the ins and outs of the various encoding. You may need
to create your own wrapper to get things to work correctly.

What happens if you create a text file using notepad and make sure that you
save it as UTF8 and then read that file with the content that you want?
Then write it to SQLite and get it back and write it back to another file?
Does that work?

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

J Decker
On Tue, Oct 7, 2014 at 2:20 PM, jose isaias cabrera <[hidden email]
> wrote:

>
> "J Decker" wrote...
>
>
>  So, I guess it is technically not allowed to encode 11 bit unicode
>> characters as 16.
>> the greek characters are 0x3XX  which is 10 bits... I checked what
>> WideCharToMultiByte was doing and found it was using 11 bit encodings...
>> fixed my encoder to use an appropriate size for what's required, added 11
>> bit decoding, and now in and out works for that and some chinese
>> characters
>> which are more than 11 bits.
>>
>> The 'unrecognized token' is 0xE0  ? ... although a thing could be 12 bits
>> exactly... so is it checking ( char[0] == 0xe0 ) && ( ( char[1] & 0xE0 )
>> ==
>> 0x80 )?
>>
>> as a side note.. using visual studio to mouse over the resulting char *
>> string with 11 bit encodings it shows bad characters, if encoded as (valid
>> but illegal) 16 bit it browses correctly.
>>
>
> J,
>
> My suggestion is for you to read about ANSI, ASCII, UTF7, UTF8, UTF16 and
> UTF32 and understand the ins and outs of the various encoding. You may need
> to create your own wrapper to get things to work correctly.
>

Right; I did, and have, but missed the part 'must be encoded in least bits'
(and I'm not sure it is there, and visual studio sees it as a valid thing
to do; to use a encoding larger than the number of bits required)  and
unicode only uses 20 bits max so the extended 5 and 6 byte utf-8 encodings
never get used  Have a custom wrapper for systems that are not windows; and
now it's more robust.

Still think it's something of a bug, but has been worked around so for me
it won't be an issue again.



>
> What happens if you create a text file using notepad and make sure that
> you save it as UTF8 and then read that file with the content that you want?
> Then write it to SQLite and get it back and write it back to another file?
> Does that work?
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF support

Christopher Vance-8
Actually, Unicode / ISO 10646 is a 21-bit encoding, with values from 0 to
0x10FFFF.

On Wed, Oct 8, 2014 at 10:13 AM, J Decker <[hidden email]> wrote:

> On Tue, Oct 7, 2014 at 2:20 PM, jose isaias cabrera <
> [hidden email]
> > wrote:
>
> >
> > "J Decker" wrote...
> >
> >
> >  So, I guess it is technically not allowed to encode 11 bit unicode
> >> characters as 16.
> >> the greek characters are 0x3XX  which is 10 bits... I checked what
> >> WideCharToMultiByte was doing and found it was using 11 bit encodings...
> >> fixed my encoder to use an appropriate size for what's required, added
> 11
> >> bit decoding, and now in and out works for that and some chinese
> >> characters
> >> which are more than 11 bits.
> >>
> >> The 'unrecognized token' is 0xE0  ? ... although a thing could be 12
> bits
> >> exactly... so is it checking ( char[0] == 0xe0 ) && ( ( char[1] & 0xE0 )
> >> ==
> >> 0x80 )?
> >>
> >> as a side note.. using visual studio to mouse over the resulting char *
> >> string with 11 bit encodings it shows bad characters, if encoded as
> (valid
> >> but illegal) 16 bit it browses correctly.
> >>
> >
> > J,
> >
> > My suggestion is for you to read about ANSI, ASCII, UTF7, UTF8, UTF16 and
> > UTF32 and understand the ins and outs of the various encoding. You may
> need
> > to create your own wrapper to get things to work correctly.
> >
>
> Right; I did, and have, but missed the part 'must be encoded in least bits'
> (and I'm not sure it is there, and visual studio sees it as a valid thing
> to do; to use a encoding larger than the number of bits required)  and
> unicode only uses 20 bits max so the extended 5 and 6 byte utf-8 encodings
> never get used  Have a custom wrapper for systems that are not windows; and
> now it's more robust.
>
> Still think it's something of a bug, but has been worked around so for me
> it won't be an issue again.
>
>
>
> >
> > What happens if you create a text file using notepad and make sure that
> > you save it as UTF8 and then read that file with the content that you
> want?
> > Then write it to SQLite and get it back and write it back to another
> file?
> > Does that work?
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
Christopher Vance
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users