Help with sqlite3_value_text

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Help with sqlite3_value_text

curmudgeon
I’m still confused by utf strings. For simplicity, suppose I set up an sqlite function that takes a single string parameter and I want to scan the string to count the number of occurrences of a certain character . If I knew the string was made up entirely of ascii chars I’d do this

char *c = &sqlite3_value_text(0)[0];
int count=0;
while (*c) if (*c++ == SearchChar) count++;

How do I do the same thing if the string param is a utf-8 or utf-16 string and the SearchChar is a Unicode character?

I’m confused by the fact that Unicode characters are not a fixed number of bytes so if I do this e.g.

wchar_t *c = (wchar_t*) sqlite3_value_text(0);

does this mean a complete temporary copy of the value of sqlite3_value_text(0) has to be constructed by the compiler such that all characters of the newly constructed string are fixed width? If so, I’m just wanting to check if there’s a way  of avoiding this overhead.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Igor Tandetnik-2
On 4/12/2019 10:51 AM, x wrote:
> I’m still confused by utf strings. For simplicity, suppose I set up an sqlite function that takes a single string parameter and I want to scan the string to count the number of occurrences of a certain character . If I knew the string was made up entirely of ascii chars I’d do this
>
> char *c = &sqlite3_value_text(0)[0];
> int count=0;
> while (*c) if (*c++ == SearchChar) count++;
>
> How do I do the same thing if the string param is a utf-8 or utf-16 string and the SearchChar is a Unicode character?

The problem you need to solve is "count occurrences of a substring in a string". The substring in question could consist of one byte representing a single ASCII character, or a sequence of bytes comprising a UTF-8 encoding of one Unicode character. This really has nothing to do with SQLite.

> I’m confused by the fact that Unicode characters are not a fixed number of bytes so if I do this e.g.
>
> wchar_t *c = (wchar_t*) sqlite3_value_text(0);

That's just wrong. sqlite3_value_text does *not* return a pointer to a sequence of wchar_t. Any attempt to actually use `c` pointer would exhibit undefined behavior.

> does this mean a complete temporary copy of the value of sqlite3_value_text(0) has to be constructed by the compiler such that all characters of the newly constructed string are fixed width? If so, I’m just wanting to check if there’s a way  of avoiding this overhead.

You seem to ascribe some magical properties to a cast. Nothing is "constructed" by it - it simply tells the compiler "take this pointer to a memory block, and believe that it contains something different than what the type of the original pointer suggests; trust me, I know what I'm doing".

If you prefer UTF-16 encoding over UTF-8, there's sqlite3_value_text16 for that.

If you are unsure what UTF-8 and UTF-16 mean, see

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

--
Igor Tandetnik


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Dominique Devienne
In reply to this post by curmudgeon
On Fri, Apr 12, 2019 at 4:51 PM x <[hidden email]> wrote:

> I’m still confused by utf strings. [...  I want to scan the string to
> count the number of occurrences of a certain character. [...]
> How do I do the same thing if the string param is a utf-8 or utf-16 string
> and the SearchChar is a Unicode character?
>
> I’m confused by the fact that Unicode characters are not a fixed number of
> bytes so if I do this e.g.
>
> wchar_t *c = (wchar_t*) sqlite3_value_text(0);


That's very wrong. _text() always returns UTF8. the _text16*() variants
return UTF16.

As to how many bytes a UTF8-encoded "codepoint" takes, it's well documented
on Wikipedia.
Based on the leading bits, one can know unambiguously whether this is the
1st, 2nd, 3rd, or 4th
byte of a 1 to 4 multi-byte sequence.

Even UTF16 can lead to "surrogate pairs" for codepoints beyond the
so-called "CMP".

And that's not even getting into the fact the encoding may not be "unique",
and Unicode "normalization".
This is not an easy subject...

You can play with the char() built-in SQL function to see how different
code point values are encoded in UTF8. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

J Decker
In reply to this post by curmudgeon
http://utf8everywhere.org/


On Fri, Apr 12, 2019 at 7:51 AM x <[hidden email]> wrote:

> I’m still confused by utf strings. For simplicity, suppose I set up an
> sqlite function that takes a single string parameter and I want to scan the
> string to count the number of occurrences of a certain character . If I
> knew the string was made up entirely of ascii chars I’d do this
>
> char *c = &sqlite3_value_text(0)[0];
> int count=0;
> while (*c) if (*c++ == SearchChar) count++;
>
> How do I do the same thing if the string param is a utf-8 or utf-16 string
> and the SearchChar is a Unicode character?
>

You'd decide on the common operating mode of your system for 1... there are
of course middleware libraries that need to cater to both and consume
databases instead of just writing it.


>
> I’m confused by the fact that Unicode characters are not a fixed number of
> bytes so if I do this e.g.
>
> wchar_t *c = (wchar_t*) sqlite3_value_text(0);
>
>
https://www.sqlite.org/c3ref/value_blob.html

const void *sqlite3_value_text16(

returns wchar data, converting if nessecary from utf8 if it was stored as
utf8.

value_text only returns utf8 (char* really, but suffices for storage and
retreival of utf8 phrases)

for utf8 it would just be matter of matching 1,2,3, or 4 bytes (depending
on the codepoint you were matching).



does this mean a complete temporary copy of the value of
> sqlite3_value_text(0) has to be constructed by the compiler such that all
> characters of the newly constructed string are fixed width? If so, I’m just
> wanting to check if there’s a way  of avoiding this overhead.
>
>
If it is converted, a temporary value is allocated by the library (at
runtime, well beyond the compiler), ...


> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Shawn Wagner
In reply to this post by curmudgeon
Welcome to the wonderful world of multibyte encodings, and Unicode in
particular.

Unless you're looking for an ASCII character, you're looking for a
substring, not a character. And you're really looking for what's called a
codepoint (The entire concept of character gets kind of fuzzy with
Unicode). If you're not careful, looking for 'a' (U+0061 LATIN SMALL LETTER
A) will match the start of á, which is actually a two codepoint grapheme
(U+0061 and U+0301 COMBINING ACUTE ACCENT) that renders as a single entity.
And if you're okay with matching that, what about á (U+00E1 LATIN SMALL
LETTER A WITH ACUTE), the single codepoint composed version?

Unicode is hard. There are libraries like ICU and libunistring which help a
bit. I have a bunch of sqlite extensions at
https://github.com/shawnw/useful_sqlite_extensions (That I really need to
polish up for an actual release) including a string library that expands a
lot on the build in ICU extension to make working with graphemes and
unicode in general in sqlite a lot easier.


On Fri, Apr 12, 2019 at 7:51 AM x <[hidden email]> wrote:

> I’m still confused by utf strings. For simplicity, suppose I set up an
> sqlite function that takes a single string parameter and I want to scan the
> string to count the number of occurrences of a certain character . If I
> knew the string was made up entirely of ascii chars I’d do this
>
> char *c = &sqlite3_value_text(0)[0];
> int count=0;
> while (*c) if (*c++ == SearchChar) count++;
>
> How do I do the same thing if the string param is a utf-8 or utf-16 string
> and the SearchChar is a Unicode character?
>
> I’m confused by the fact that Unicode characters are not a fixed number of
> bytes so if I do this e.g.
>
> wchar_t *c = (wchar_t*) sqlite3_value_text(0);
>
> does this mean a complete temporary copy of the value of
> sqlite3_value_text(0) has to be constructed by the compiler such that all
> characters of the newly constructed string are fixed width? If so, I’m just
> wanting to check if there’s a way  of avoiding this overhead.
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Warren Young
In reply to this post by curmudgeon
On Apr 12, 2019, at 8:51 AM, x <[hidden email]> wrote:
>
> How do I do the same thing if the string param is a utf-8 or utf-16 string and the SearchChar is a Unicode character?

Convert the characters to 32-bit wide characters first, then iterate over the array of uint32_t or similar.

One method is shown by the SQLite-internal function sqlite3Utf8Read().  It’s static in the amalgamation build, but since SQLite is public domain, you can just copy that function’s text out into your program and use it there or modify it to suit your purposes.

Your platform libraries may have UTF-8 to UTF-32 or similar mechanisms.

On POSIX platforms, the most common of these is iconv(3).

On Windows, the legacy of UCS-2 and UTF-16 makes this difficult, but if you can stick to the Basic Multilingual Plane, converting UTF-8 to UCS-2 gives the same effect.  See MultiByteToWideChar(…, CP_UTF8, …):

    https://docs.microsoft.com/windows/desktop/api/stringapiset/nf-stringapiset-multibytetowidechar

For all platforms, there’s ICU.  That’s of particular interest with SQLite since there’s an included ICU extension you can include to get more Unicode power in SQLite:

   https://www.sqlite.org/compile.html#enable_icu

There are complications that your questions don’t push into, but beware that if you start getting beyond simple “character in string” questions, you’ll eventually have to confront them: combining characters, etc.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

curmudgeon
Thanks for the replies. There’s plenty for me to look at there.

I’ve been in poor health the last 5 years and after almost a year’s break I’m trying to get back into sqlite to preserve my sanity. I’m so rusty my opening post is riddled with errors.

I’ve just realised that, before my break, I was dealing with this as follows

#define VALTXT(i) String((wchar_t*)sqlite3_value_text16(values[i]))

and inside the function I’d access the parameter with

String S = VALTXT(0);

The String type is an embarcadero c++ builder typedef which I think is a wide string. I then accessed the i’th character using S[I]. The above involves copying the parameter to another string.

In an old post I made on this forum someone told me I should be using sqlite3_value_text( as that was sqlite’s default storage and would save sqlite having to convert it to utf16.

I’ve been asking myself if I could have done the above more efficiently as sqlite’s converting the original string then I’m converting it and copying it. While thinking about that I started to wonder how c++ handled utf8/16. E.g. To access the i’th character does it have to rattle through all previous I-1 characters to find the start of character i, how pointer arithmetic was handled when pointing to utf8/16 chars etc.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Richard Damon


> On Apr 12, 2019, at 12:58 PM, x <[hidden email]> wrote:
>
> I’ve been asking myself if I could have done the above more efficiently as sqlite’s converting the original string then I’m converting it and copying it. While thinking about that I started to wonder how c++ handled utf8/16. E.g. To access the i’th character does it have to rattle through all previous I-1 characters to find the start of character i, how pointer arithmetic was handled when pointing to utf8/16 chars etc.
>

Basically, if you are dealing with a variable width encoding (UTF-8/UTF-16), then finding the nth character requires scanning the string counting beginning of characters. If this is an important operation, you pay the cost of conversion and work in UCS-4. On the other hand, UTF-8 has a lot of nice properties such that it can be a fairly seamless upgrade for processing plain ASCII text, and if reasonably efficient for typical text. (There are a number of complications if you try to support ALL of Unicode, like the composed characters, where you use several code-point together to define a single character), where you need to decide how you want to normalize and need some big character tables for the instructions of how to do this.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Keith Medcalf
In reply to this post by Warren Young

Actually you would have to convert the strings to UCS-4.  UTF-16 is a variable-length encoding.  An actual "unicode character" is (at this present moment in time, though perhaps not tomorrow) 4 bytes (64-bits).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Warren Young
>Sent: Friday, 12 April, 2019 09:45
>To: SQLite mailing list
>Subject: Re: [sqlite] Help with sqlite3_value_text
>
>On Apr 12, 2019, at 8:51 AM, x <[hidden email]> wrote:
>>
>> How do I do the same thing if the string param is a utf-8 or utf-16
>string and the SearchChar is a Unicode character?
>
>Convert the characters to 32-bit wide characters first, then iterate
>over the array of uint32_t or similar.
>
>One method is shown by the SQLite-internal function
>sqlite3Utf8Read().  It’s static in the amalgamation build, but since
>SQLite is public domain, you can just copy that function’s text out
>into your program and use it there or modify it to suit your
>purposes.
>
>Your platform libraries may have UTF-8 to UTF-32 or similar
>mechanisms.
>
>On POSIX platforms, the most common of these is iconv(3).
>
>On Windows, the legacy of UCS-2 and UTF-16 makes this difficult, but
>if you can stick to the Basic Multilingual Plane, converting UTF-8 to
>UCS-2 gives the same effect.  See MultiByteToWideChar(…, CP_UTF8, …):
>
>    https://docs.microsoft.com/windows/desktop/api/stringapiset/nf-
>stringapiset-multibytetowidechar
>
>For all platforms, there’s ICU.  That’s of particular interest with
>SQLite since there’s an included ICU extension you can include to get
>more Unicode power in SQLite:
>
>   https://www.sqlite.org/compile.html#enable_icu
>
>There are complications that your questions don’t push into, but
>beware that if you start getting beyond simple “character in string”
>questions, you’ll eventually have to confront them: combining
>characters, etc.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Scott Robison-2
On Fri, Apr 12, 2019, 1:06 PM Keith Medcalf <[hidden email]> wrote:

>
> Actually you would have to convert the strings to UCS-4.  UTF-16 is a
> variable-length encoding.  An actual "unicode character" is (at this
> present moment in time, though perhaps not tomorrow) 4 bytes (64-bits).
>

That is some impressive compression! :)

Regardless, even if you use UCS-4, you still have the issue of combining
characters. Unicode is complex as had been observed.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

lhelgerson


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Scott Robison
Sent: Friday, April 12, 2019 2:40 PM
To: SQLite mailing list
Subject: Re: [sqlite] Help with sqlite3_value_text

On Fri, Apr 12, 2019, 1:06 PM Keith Medcalf <[hidden email]> wrote:

>
> Actually you would have to convert the strings to UCS-4.  UTF-16 is a
> variable-length encoding.  An actual "unicode character" is (at this
> present moment in time, though perhaps not tomorrow) 4 bytes (64-bits).
>

That is some impressive compression! :)

Regardless, even if you use UCS-4, you still have the issue of combining characters. Unicode is complex as had been observed.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Warren Young
In reply to this post by Keith Medcalf
On Apr 12, 2019, at 1:06 PM, Keith Medcalf <[hidden email]> wrote:
>
> Actually you would have to convert the strings to UCS-4.

UTF-32 is the new name of that standard:

   https://en.wikipedia.org/wiki/UTF-32#History

> UTF-16 is a variable-length encoding.

Only if you’re outside the BMP, which is why I restricted my answer that way.

And if you add in combining characters and such, *all* Unicode encodings are variable-length.

We’re not going to replace the whole books necessary to fully cover Unicode here.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

curmudgeon
Thanks for all the help. Things are much clearer now.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

curmudgeon
In reply to this post by Scott Robison-2
This seems to work OK as a sqlite function.



// assume values[0] & [1] are supplied and not null

// find Count of values[1] in values[0]



char *c = (char *)sqlite3_value_text(values[0]);

char *Sep = (char *)sqlite3_value_text(values[1]);

int Byte1, Count=0, NrBytes, NrSepBytes = strlen(Sep);



while (*c)

{



       Byte1 = (*c) >> 4;

       if ((Byte1 & 8) == 0) NrBytes = 1;

       else if (Byte1 & 1) NrBytes = 4;

       else if (Byte1 & 2) NrBytes = 3;

       else NrBytes = 2; // (Byte1 & 4) == 4



       if (NrBytes == NrSepBytes && memcmp(c, Sep, NrBytes) == 0) Count++; // at first byte of Sep

       c += NrBytes;

}

sqlite3_result_int(ctx, Count);



________________________________
From: sqlite-users <[hidden email]> on behalf of Scott Robison <[hidden email]>
Sent: Friday, April 12, 2019 8:40:19 PM
To: SQLite mailing list
Subject: Re: [sqlite] Help with sqlite3_value_text

On Fri, Apr 12, 2019, 1:06 PM Keith Medcalf <[hidden email]> wrote:

>
> Actually you would have to convert the strings to UCS-4.  UTF-16 is a
> variable-length encoding.  An actual "unicode character" is (at this
> present moment in time, though perhaps not tomorrow) 4 bytes (64-bits).
>

That is some impressive compression! :)

Regardless, even if you use UCS-4, you still have the issue of combining
characters. Unicode is complex as had been observed.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

J Decker
On Sat, Apr 13, 2019 at 12:04 PM x <[hidden email]> wrote:

> This seems to work OK as a sqlite function.
>
>
>
> // assume values[0] & [1] are supplied and not null
>
> // find Count of values[1] in values[0]
>
>
>
> char *c = (char *)sqlite3_value_text(values[0]);
>
> char *Sep = (char *)sqlite3_value_text(values[1]);
>
> int Byte1, Count=0, NrBytes, NrSepBytes = strlen(Sep);
>
you could use sqlite3_value_bytes(values[1]); instead.

>
>
>
> while (*c)
>
> {
>
>
>
>        Byte1 = (*c) >> 4;
>
>        if ((Byte1 & 8) == 0) NrBytes = 1;
>
>        else if (Byte1 & 1) NrBytes = 4;
>
>        else if (Byte1 & 2) NrBytes = 3;
>
>        else NrBytes = 2; // (Byte1 & 4) == 4
>
>
>
>        if (NrBytes == NrSepBytes && memcmp(c, Sep, NrBytes) == 0) Count++;
> // at first byte of Sep
>
>        c += NrBytes;
>
> }
>
> sqlite3_result_int(ctx, Count);
>
>
>
> ________________________________
> From: sqlite-users <[hidden email]> on
> behalf of Scott Robison <[hidden email]>
> Sent: Friday, April 12, 2019 8:40:19 PM
> To: SQLite mailing list
> Subject: Re: [sqlite] Help with sqlite3_value_text
>
> On Fri, Apr 12, 2019, 1:06 PM Keith Medcalf <[hidden email]> wrote:
>
> >
> > Actually you would have to convert the strings to UCS-4.  UTF-16 is a
> > variable-length encoding.  An actual "unicode character" is (at this
> > present moment in time, though perhaps not tomorrow) 4 bytes (64-bits).
> >
>
> That is some impressive compression! :)
>
> Regardless, even if you use UCS-4, you still have the issue of combining
> characters. Unicode is complex as had been observed.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

curmudgeon


From: J Decker<mailto:[hidden email]>
Sent: 13 April 2019 20:05
To: SQLite mailing list<mailto:[hidden email]>
Subject: Re: [sqlite] Help with sqlite3_value_text

>> char *c = (char *)sqlite3_value_text(values[0]);
>> char *Sep = (char *)sqlite3_value_text(values[1]);
>> int Byte1, Count=0, NrBytes, NrSepBytes = strlen(Sep);

>you could use sqlite3_value_bytes(values[1]); instead.

Hi JD, Would I have to worry about this

“Please pay particular attention to the fact that the pointer returned from sqlite3_value_blob()<https://sqlite.org/c3ref/value_blob.html>, sqlite3_value_text()<https://sqlite.org/c3ref/value_blob.html>, or sqlite3_value_text16()<https://sqlite.org/c3ref/value_blob.html> can be invalidated by a subsequent call to sqlite3_value_bytes()<https://sqlite.org/c3ref/value_blob.html>, sqlite3_value_bytes16()<https://sqlite.org/c3ref/value_blob.html>, sqlite3_value_text()<https://sqlite.org/c3ref/value_blob.html>, or sqlite3_value_text16()<https://sqlite.org/c3ref/value_blob.html>.”

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

curmudgeon
On second thoughts JD, can’t use strlen or sqlite3_value_bytes in case values(1) contains more than a single unicode character. This looks OK.

# define CHARLEN(x) !(x & 128) ? 1 : (x & 16 ? 4 : (x & 32 ? 3 : 2))

char *c = (char *)sqlite3_value_text(values[0]);
char *Sep = (char *)sqlite3_value_text(values[1]);
int Count=0, Len, SepLen = CHARLEN(*Sep);

while (*c)
{
       if ((Len = CHARLEN(*c)) == SepLen && memcmp(c, Sep, Len)==0) Count++; // at start of Sep
       c += Len;
}
sqlite3_result_int(ctx, Count);

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

J Decker
On Sun, Apr 14, 2019 at 5:40 AM x <[hidden email]> wrote:

> On second thoughts JD, can’t use strlen or sqlite3_value_bytes in case
> values(1) contains more than a single unicode character. This looks OK.
>
> Bytes are what you need though; it doesn't matter how big the buffer is,
as long as you have all of it.

As long as you use _value_bytes after _text you're fine... so if any
conversion did take place the value will be right of the last returned
string type.


> # define CHARLEN(x) !(x & 128) ? 1 : (x & 16 ? 4 : (x & 32 ? 3 : 2))
>
> char *c = (char *)sqlite3_value_text(values[0]);
> char *Sep = (char *)sqlite3_value_text(values[1]);
> int Count=0, Len, SepLen = CHARLEN(*Sep);
>
> while (*c)
> {
>        if ((Len = CHARLEN(*c)) == SepLen && memcmp(c, Sep, Len)==0)
> Count++; // at start of Sep
>        c += Len;
> }
> sqlite3_result_int(ctx, Count);
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

curmudgeon
>As long as you use _value_bytes after _text you're fine... so if any
>conversion did take place the value will be right of the last returned
>string type.

JD, Could you explain that to me? I’m not sure why any conversion takes place and, on reading the text below, I would’ve thought it would be better to call sqlite3_value_bytes first (if it’s called “subsequently” the pointer returned by sqlite3_value_text “can be invalidated”).

Please pay particular attention to the fact that the pointer returned from sqlite3_value_blob()<https://sqlite.org/c3ref/value_blob.html>, sqlite3_value_text()<https://sqlite.org/c3ref/value_blob.html>, or sqlite3_value_text16()<https://sqlite.org/c3ref/value_blob.html> can be invalidated by a subsequent call to sqlite3_value_bytes()<https://sqlite.org/c3ref/value_blob.html>, sqlite3_value_bytes16()<https://sqlite.org/c3ref/value_blob.html>, sqlite3_value_text()<https://sqlite.org/c3ref/value_blob.html>, or sqlite3_value_text16()<https://sqlite.org/c3ref/value_blob.html>.”


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Help with sqlite3_value_text

Clemens Ladisch
x wrote:
>> As long as you use _value_bytes after _text you're fine... so if any
>> conversion did take place the value will be right of the last returned
>> string type.
>
> Could you explain that to me? I’m not sure why any conversion takes place
> and, on reading the text below, I would’ve thought it would be better to
> call sqlite3_value_bytes first

As shown in the table, conversion from TEXT to BLOB does not change anything.
However, conversion from BLOB to TEXT might require appending a zero terminator.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12