UNICODE Support

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

UNICODE Support

ajay-7
Hello there,

Does SQLite support UNICODE? Can I store some Arabic or Chinese text in
database?

If it does not support UNICODE, Is there any workaround for that?

 

Regards,

Ajay Sonawane

 

Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Martin Engelschalk
Hi,

See http://www.sqlite.org/pragma.html, search for 'PRAGMA encoding'

/Martin

Ajay schrieb:

>Hello there,
>
>Does SQLite support UNICODE? Can I store some Arabic or Chinese text in
>database?
>
>If it does not support UNICODE, Is there any workaround for that?
>
>
>
>Regards,
>
>Ajay Sonawane
>
>
>
>
>  
>
Reply | Threaded
Open this post in threaded view
|

RE: UNICODE Support

ajay-7

But what about the SQLite Function's parameters whose data type is LPSTR ?
Let me know the details to support wide char ?

Regards,
Ajay Sonawane


-----Original Message-----
From: Martin Engelschalk [mailto:[hidden email]]
Sent: Wednesday, June 08, 2005 6:48 PM
To: [hidden email]
Subject: Re: [sqlite] UNICODE Support

Hi,

See http://www.sqlite.org/pragma.html, search for 'PRAGMA encoding'

/Martin

Ajay schrieb:

>Hello there,
>
>Does SQLite support UNICODE? Can I store some Arabic or Chinese text in
>database?
>
>If it does not support UNICODE, Is there any workaround for that?
>
>
>
>Regards,
>
>Ajay Sonawane
>
>
>
>
>  
>

Reply | Threaded
Open this post in threaded view
|

RE: UNICODE Support

Dennis Volodomanov-5
In reply to this post by ajay-7
You can convert your text using A2W() and W2A() functions (or others)
before passing it to SQLite and after retrieving it back from SQLite.
That's what we do (it's a Japanese application).

   Dennis

-----Original Message-----
From: Ajay [mailto:[hidden email]]
Sent: Thursday, June 09, 2005 12:12 AM
To: [hidden email]
Subject: RE: [sqlite] UNICODE Support


But what about the SQLite Function's parameters whose data type is LPSTR
?
Let me know the details to support wide char ?

Regards,
Ajay Sonawane


-----Original Message-----
From: Martin Engelschalk [mailto:[hidden email]]
Sent: Wednesday, June 08, 2005 6:48 PM
To: [hidden email]
Subject: Re: [sqlite] UNICODE Support

Hi,

See http://www.sqlite.org/pragma.html, search for 'PRAGMA encoding'

/Martin

Ajay schrieb:

>Hello there,
>
>Does SQLite support UNICODE? Can I store some Arabic or Chinese text in

>database?
>
>If it does not support UNICODE, Is there any workaround for that?
>
>
>
>Regards,
>
>Ajay Sonawane
>
>
>
>
>  
>





Reply | Threaded
Open this post in threaded view
|

RE: UNICODE Support

RohitPatel9999
Hi Dennis Volodomanov

I am using SQLite 3.3.4.

My Win32 Application needs international language support (Chinese, Japanese).
I need my Win32 Application to build such that,
MBCS defined for Windows 98/ME and
UNICODE (and _UNICODE) defined for Windows NT/2000/2003/XP.

Can you help me by giving some sample code which inserts/selects/updates SQLite db (UTF-8) ?

Also if you can help me with some guidelines from your experience regarding using SQLite db (UTF-8) for international languages ?

Thank you for any help.
Rohit
Reply | Threaded
Open this post in threaded view
|

Re: RE: UNICODE Support

Cory Nelson
On 8/3/06, RohitPatel9999 <[hidden email]> wrote:

>
> Hi Dennis Volodomanov
>
> I am using SQLite 3.3.4.
>
> My Win32 Application needs international language support (Chinese,
> Japanese).
> I need my Win32 Application to build such that,
> MBCS defined for Windows 98/ME and
> UNICODE (and _UNICODE) defined for Windows NT/2000/2003/XP.
>
> Can you help me by giving some sample code which inserts/selects/updates
> SQLite db (UTF-8) ?
>
> Also if you can help me with some guidelines from your experience regarding
> using SQLite db (UTF-8) for international languages ?

I recommend using utf-16 in the database - sqlite doesn't fully
support utf-8, and some things may give unexpected results if you use
it.

> Thank you for any help.
> Rohit
>
> --
> View this message in context: http://www.nabble.com/UNICODE-Support-tf58444.html#a5644461
> Sent from the SQLite forum at Nabble.com.
>
>


--
Cory Nelson
http://www.int64.org
Reply | Threaded
Open this post in threaded view
|

Re: RE: UNICODE Support

Nuno Lucas-2
On 8/4/06, Cory Nelson <[hidden email]> wrote:
> I recommend using utf-16 in the database - sqlite doesn't fully
> support utf-8, and some things may give unexpected results if you use
> it.

Could you give some example of unexpected result with UTF-8?

In my experience the only unexpected results with UTF-8 were bugs in
my program (like passing non-UTF-8 string).

Any other unexpected result should be considered a bug in SQLite and
reported as such.


Regards,
~Nuno Lucas

> --
> Cory Nelson
> http://www.int64.org
>
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

D. Richard Hipp
In reply to this post by Cory Nelson
"Cory Nelson" <[hidden email]> wrote:
> On 8/3/06, RohitPatel9999 <[hidden email]> wrote:
>
> I recommend using utf-16 in the database - sqlite doesn't fully
> support utf-8, and some things may give unexpected results if you use
> it.
>

Oh really?  What exactly is missing from SQLite's UTF-8 support?
--
D. Richard Hipp   <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: RE: UNICODE Support

Martin Jenkins
In reply to this post by Cory Nelson
Cory Nelson wrote:

> I recommend using utf-16 in the database - sqlite doesn't fully
> support utf-8, and some things may give unexpected results if you use
> it.

Could you expand a bit on this please?

I haven't seen any bugs as such with sqlite as such but I did have a few
problems storing "foreign" characters through the Python wrappers to
sqlite, where the wrappers barfed converting the "foreign" character.

In one case it was because the source (Windows app) lied about the
encoding - it claimed the text was UTF-8 when it was windows-1252.
In the other case the text came from a Unix box and was supposed to be
7-bit ASCII, but I suspect it was generated by a Windows app, as above.

I think I've got this all sorted in my mind but if you say sqlite has
issues handling UTF-8 then I need to look at it again.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: RE: UNICODE Support

Will Leshner-3
In reply to this post by Cory Nelson
On 8/3/06, Cory Nelson <[hidden email]> wrote:

> I recommend using utf-16 in the database - sqlite doesn't fully
> support utf-8, and some things may give unexpected results if you use
> it.

As with others who have replied, I have not had a problem working with
UTF8 in a SQLite database.
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Cory Nelson
In reply to this post by D. Richard Hipp
On 8/4/06, [hidden email] <[hidden email]> wrote:
> "Cory Nelson" <[hidden email]> wrote:
> > On 8/3/06, RohitPatel9999 <[hidden email]> wrote:
> >
> > I recommend using utf-16 in the database - sqlite doesn't fully
> > support utf-8, and some things may give unexpected results if you use
> > it.
> >
>
> Oh really?  What exactly is missing from SQLite's UTF-8 support?

Correct me if I'm wrong but from what I understand SQLite supports
storing and converting between UTF-8 and UTF-16, but that is where the
support stops.  It is wrong (in my opinion) to claim UTF-8 support, at
least without a clear upfront warning, when that's all it offers.

IE, using memcmp() to compare strings.  I've been bitten by this
before, with SQLite producing unexpected results when using UTF-8.
Using UTF-16 has worked more reliably in my experience.

> --
> D. Richard Hipp   <[hidden email]>
>
>


--
Cory Nelson
http://www.int64.org
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Nuno Lucas-2
On 8/4/06, Cory Nelson <[hidden email]> wrote:
> IE, using memcmp() to compare strings.  I've been bitten by this
> before, with SQLite producing unexpected results when using UTF-8.
> Using UTF-16 has worked more reliably in my experience.

SQLite only knows how to sort ASCII, so memcmp does that right (being
it UTF-8 or UTF-16).

If you think about it, the only way sorting will work 100% is by
having some form of localization (because for each language different
sorting rules apply, _even_ for words composed only of ASCII
characters).

Adding localization to SQLite is out of the question (it would
probably need a library as big as SQLite itself), so it's up to the
user to define it's own localization funtions and integrate them with
sqlite (there are all the necessary hooks ready for that).


Regards,
~Nuno Lucas
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Cory Nelson
On 8/4/06, Nuno Lucas <[hidden email]> wrote:

> On 8/4/06, Cory Nelson <[hidden email]> wrote:
> > IE, using memcmp() to compare strings.  I've been bitten by this
> > before, with SQLite producing unexpected results when using UTF-8.
> > Using UTF-16 has worked more reliably in my experience.
>
> SQLite only knows how to sort ASCII, so memcmp does that right (being
> it UTF-8 or UTF-16).
>
> If you think about it, the only way sorting will work 100% is by
> having some form of localization (because for each language different
> sorting rules apply, _even_ for words composed only of ASCII
> characters).
>
> Adding localization to SQLite is out of the question (it would
> probably need a library as big as SQLite itself), so it's up to the
> user to define it's own localization funtions and integrate them with
> sqlite (there are all the necessary hooks ready for that).

I was not talking about sorting in my post - I've had simple = index
comparisons fail in UTF-8.

But, since you brought it up - I have no expectations of SQLite
integrating a full Unicode locale library, however it would be a great
improvement if it would respect the current locale and use wcs*
functions when available, or at least order by standard Unicode order
instead of completely mangling things on UTF-8 codes.

>
> Regards,
> ~Nuno Lucas
>


--
Cory Nelson
http://www.int64.org
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Nuno Lucas-2
On 8/5/06, Cory Nelson <[hidden email]> wrote:

> On 8/4/06, Nuno Lucas <[hidden email]> wrote:
> > On 8/4/06, Cory Nelson <[hidden email]> wrote:
> > > IE, using memcmp() to compare strings.  I've been bitten by this
> > > before, with SQLite producing unexpected results when using UTF-8.
> > > Using UTF-16 has worked more reliably in my experience.
> >
> > SQLite only knows how to sort ASCII, so memcmp does that right (being
> > it UTF-8 or UTF-16).
> >
> > If you think about it, the only way sorting will work 100% is by
> > having some form of localization (because for each language different
> > sorting rules apply, _even_ for words composed only of ASCII
> > characters).
> >
> > Adding localization to SQLite is out of the question (it would
> > probably need a library as big as SQLite itself), so it's up to the
> > user to define it's own localization funtions and integrate them with
> > sqlite (there are all the necessary hooks ready for that).
>
> I was not talking about sorting in my post - I've had simple = index
> comparisons fail in UTF-8.

You should have reported it. If it's true, it's a bug that needs to be
corrected.
But again I would say I never found a bug like that in sqlite.

> But, since you brought it up - I have no expectations of SQLite
> integrating a full Unicode locale library, however it would be a great
> improvement if it would respect the current locale and use wcs*
> functions when available, or at least order by standard Unicode order
> instead of completely mangling things on UTF-8 codes.

For it to respect the current locale then the database would be
invalid after moving/using it in another locale (the affected indexes
would need to be rebuilt). Using the COLATE thing (which I never used
exactly because of the problem above) you can define your own sort
function that does what you want.

On the second point, you may be right and can be considered a bug. A
sorted table should have exactly the same order either if the database
is using UTF-8 or UTF-16 internally (even if it doesn't follow the
UNICODE order). At least it seems consistency on a query result should
be assured on this.

Maybe others have another point of view...


Regards,
~Nuno Lucas
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Trevor Talbot-2
In reply to this post by Cory Nelson
On 8/4/06, Cory Nelson <[hidden email]> wrote:

> But, since you brought it up - I have no expectations of SQLite
> integrating a full Unicode locale library, however it would be a great
> improvement if it would respect the current locale and use wcs*
> functions when available, or at least order by standard Unicode order
> instead of completely mangling things on UTF-8 codes.

What do you mean by "standard Unicode order" in this context?
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Cory Nelson
On 8/4/06, Trevor Talbot <[hidden email]> wrote:

> On 8/4/06, Cory Nelson <[hidden email]> wrote:
>
> > But, since you brought it up - I have no expectations of SQLite
> > integrating a full Unicode locale library, however it would be a great
> > improvement if it would respect the current locale and use wcs*
> > functions when available, or at least order by standard Unicode order
> > instead of completely mangling things on UTF-8 codes.
>
> What do you mean by "standard Unicode order" in this context?
>

Convert UTF-8 to UTF-16 (or both to UCS-4 if you want to be entirely
correct) while sorting, to at least make them follow the same pattern.

--
Cory Nelson
http://www.int64.org
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Trevor Talbot-2
On 8/4/06, Cory Nelson <[hidden email]> wrote:
> On 8/4/06, Trevor Talbot <[hidden email]> wrote:
> > On 8/4/06, Cory Nelson <[hidden email]> wrote:
> >
> > > But, since you brought it up - I have no expectations of SQLite
> > > integrating a full Unicode locale library, however it would be a great
> > > improvement if it would respect the current locale and use wcs*
> > > functions when available, or at least order by standard Unicode order
> > > instead of completely mangling things on UTF-8 codes.

> > What do you mean by "standard Unicode order" in this context?

> Convert UTF-8 to UTF-16 (or both to UCS-4 if you want to be entirely
> correct) while sorting, to at least make them follow the same pattern.

Ah, so Unicode codepoint order.  Unfortunately this isn't accurate:
UTF-8 and UTF-32/UCS-4 are both naturally in codepoint order (UTF-8
because of the MSB-first style format), but UTF-16 isn't due to the
way surrogate pairs are constructed.  UTF-16 is actually the oddball
here :P
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Nathaniel Smith
In reply to this post by Cory Nelson
On Fri, Aug 04, 2006 at 10:02:58PM -0700, Cory Nelson wrote:

> On 8/4/06, Trevor Talbot <[hidden email]> wrote:
> >On 8/4/06, Cory Nelson <[hidden email]> wrote:
> >
> >> But, since you brought it up - I have no expectations of SQLite
> >> integrating a full Unicode locale library, however it would be a great
> >> improvement if it would respect the current locale and use wcs*
> >> functions when available, or at least order by standard Unicode order
> >> instead of completely mangling things on UTF-8 codes.
> >
> >What do you mean by "standard Unicode order" in this context?
> >
>
> Convert UTF-8 to UTF-16 (or both to UCS-4 if you want to be entirely
> correct) while sorting, to at least make them follow the same pattern.

Huh?

UTF-8 handled in the naive way (using "memcmp", like sqlite does) will
automagically give you sorting by unicode codepoint (probably the only
useful meaning of "standard Unicode order" here).

UTF-16 handled in the naive way (either using "memcmp" or
lexicographically on 2-byte integers) will sort things by codepoint,
mostly, sort of, and otherwise by a weird order that falls out of
details of the UTF-16 standard accidentally.[1]

Perhaps you're using a legacy system that standardized on UTF-16
before the BMP ran out, and want to be compatible with its
idiosyncratic sorting -- then converting things to UTF-16 before
comparing makes sense.  But that's not really appropriate to make as a
general recommendation... better to convert UTF-16 to UTF-8, if you
want to be entirely correct :-).

[1] see e.g. http://icu.sourceforge.net/docs/papers/utf16_code_point_order.html

-- Nathaniel

--
Details are all that matters; God dwells there, and you never get to
see Him if you don't struggle to get them right. -- Stephen Jay Gould
Reply | Threaded
Open this post in threaded view
|

Re: UNICODE Support

Jens Miltner
In reply to this post by Cory Nelson

Am 04.08.2006 um 19:23 schrieb Cory Nelson:

> I was not talking about sorting in my post - I've had simple = index
> comparisons fail in UTF-8.

I'm pretty sure you can get the same kind of 'failure' when using  
UTF-16, e.g. when comparing decomposed against composed forms of  
unicode strings. Since sqlite only really does a 'binary' comparison,  
this may easily fail for non-ASCII strings.

Also, there's a prominent warning in the documentation about working  
with case-insensitive comparison (since it only does it right for  
ASCII characters). Maybe this is where some more complete unicode  
support is most sorely missing, but it's probably beyond sqlite's  
scope to do proper unicode-savvy case shifting...?

</jum>
Reply | Threaded
Open this post in threaded view
|

Re: RE: UNICODE Support

Pablo Santacruz
In reply to this post by RohitPatel9999
Consider upgrading to 3.3.6. The following are some changes related to the
UTF stuff.

*2006 June 6 (3.3.6)*

   - Fix an obscure segfault in UTF-8 to UTF-16 conversions

 *2006 April 5 (3.3.5)*

   - The sqlite3_create_collation() function honors the
   SQLITE_UTF16_ALIGNED flag.



On 8/4/06, RohitPatel9999 <[hidden email]> wrote:

>
>
> Hi Dennis Volodomanov
>
> I am using SQLite 3.3.4.
>
> My Win32 Application needs international language support (Chinese,
> Japanese).
> I need my Win32 Application to build such that,
> MBCS defined for Windows 98/ME and
> UNICODE (and _UNICODE) defined for Windows NT/2000/2003/XP.
>
> Can you help me by giving some sample code which inserts/selects/updates
> SQLite db (UTF-8) ?
>
> Also if you can help me with some guidelines from your experience
> regarding
> using SQLite db (UTF-8) for international languages ?
>
> Thank you for any help.
> Rohit
>
> --
> View this message in context:
> http://www.nabble.com/UNICODE-Support-tf58444.html#a5644461
> Sent from the SQLite forum at Nabble.com.
>
>


--
Pablo