Things you shouldn't assume when you store names

classic Classic list List threaded Threaded
74 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Things you shouldn't assume when you store names

Simon Slavin-3
Since I don't see many posts yet this weekend, please excuse one of mine which isn't exactly on charter.  Feel free to argue me out of posting in personal (offlist) email.

In a previous job I got to see databases made up by all sorts of other people and organisations.  Every time I saw a field called 'firstname' or 'second name' or 'surname' or 'familyname' I groaned.  So I was nodding along as I read this:

<https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/>

I think this one is unusually well-written.

In case you want to know how best to handle personal names, the current consensus seems to be to use a single field containing the whole name, which can be searched by substring.  Computer systems for places with non-Roman character sets sometimes use two fields: name in local characters (Chinese, Devanagari, etc.) and name in Roman characters.

Also note that current privacy legislation in the US and EU means you are not allowed to ask for anything like 'full legal name' unless you cannot run your business without it.  Ask them for their name, and store what they tell you, with the words in the order they gave them.  If you need to sort people in name order (think very hard about why, first), create a field called 'sort order' and populate it yourself.  Sorting is your problem, not that of the people you're sorting.

Part of a continuing series including falsehoods about dates, times, places, street addresses, gender, relations, phone numbers, taxes, and amounts of money.

Good luck, and watch your back.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

skywalk
Ok, I'll bite.
The 'current consensus' in any system is tenuous and not an arbiter of its
effectiveness.
In this case, data modelers hoping to save a column. arrggg.
It flies in the face of data normalization and pushes the problem down the
line.
Forgive my simple linear thinking on the immensely complex topic of 'your
name here and here and here'.
Sincerely,
alias  ;)

On Sat, Nov 9, 2019 at 2:26 PM Simon Slavin <[hidden email]> wrote:

> Since I don't see many posts yet this weekend, please excuse one of mine
> which isn't exactly on charter.  Feel free to argue me out of posting in
> personal (offlist) email.
>
> In a previous job I got to see databases made up by all sorts of other
> people and organisations.  Every time I saw a field called 'firstname' or
> 'second name' or 'surname' or 'familyname' I groaned.  So I was nodding
> along as I read this:
>
> <
> https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
> >
>
> I think this one is unusually well-written.
>
> In case you want to know how best to handle personal names, the current
> consensus seems to be to use a single field containing the whole name,
> which can be searched by substring.  Computer systems for places with
> non-Roman character sets sometimes use two fields: name in local characters
> (Chinese, Devanagari, etc.) and name in Roman characters.
>
> Also note that current privacy legislation in the US and EU means you are
> not allowed to ask for anything like 'full legal name' unless you cannot
> run your business without it.  Ask them for their name, and store what they
> tell you, with the words in the order they gave them.  If you need to sort
> people in name order (think very hard about why, first), create a field
> called 'sort order' and populate it yourself.  Sorting is your problem, not
> that of the people you're sorting.
>
> Part of a continuing series including falsehoods about dates, times,
> places, street addresses, gender, relations, phone numbers, taxes, and
> amounts of money.
>
> Good luck, and watch your back.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Warren Young
In reply to this post by Simon Slavin-3
On Nov 9, 2019, at 12:25 PM, Simon Slavin <[hidden email]> wrote:
>
> Every time I saw a field called 'firstname' or 'second name' or 'surname' or 'familyname' I groaned.

I just had a fight with my insurance company who had me sign up for their new web portal, which only asked for first and last name, but it kept telling me I wasn’t a customer.  They’d been happily accepting my credit card payments for years, but I’m not a customer?!

We eventually figured out what went wrong: on the paper sign-up form, they demanded my full legal name, which has a suffix. There was no spot on the form for my name’s suffix, so I put it after the last name, and their data entry drone put it in the database’s last-name field as “Young II”, so to their DBMS, there was indeed no “Warren”, “Young” row!
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Jens Alfke-2
In reply to this post by skywalk
On Nov 9, 2019, at 1:09 PM, [hidden email] wrote:
>
> In this case, data modelers hoping to save a column. arrggg.
> It flies in the face of data normalization and pushes the problem down the
> line.

But you _cannot_ normalize people’s names; that’s the exact point of that article. Anything you assume about the structure of a name will be wrong in some culture.

-Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Doug
Au Contraire, Jens! In many local contexts you can normalize people's names. I was born in Kansas, USA. My parents filled out a birth certificate for me. It had a place on the form for first name, middle name, last name, and a suffix like II or III.

That birth certificate form determined that everyone born in Kansas (at that time), had a first, middle, and last name. There was no discussion of the matter. That's the way it was. The form led the way; people never thought about whether it was effective or not. Each newly-born child was given a first, middle, and last name.

Effective was irrelevant for that system. There was no option, no alternative. It simply was.

All systems are like that at each moment in time. They are what they are at any moment in time, and they force the users to behave the way the system wants them to behave. If you want to change the system and momentum is on your side, then immediately you have a new system - at that moment in time. It is composed of the old system and the momentum.

Back to names: just like the birth certificate, a system which assigns a name to you, actually coerces you to have that name, because within that system, you exist as that name. The "names" article is totally wrong when it says that each assumption is wrong. Each of those assumptions is correct, and I can find at least one system which makes each one correct. Within each system, the assumption works, and is valid.

My two cents...
Doug

> -----Original Message-----
> From: sqlite-users <[hidden email]>
> On Behalf Of Jens Alfke
> Sent: Saturday, November 09, 2019 5:11 PM
> To: SQLite mailing list <[hidden email]>
> Subject: Re: [sqlite] Things you shouldn't assume when you store
> names
>
> On Nov 9, 2019, at 1:09 PM, [hidden email] wrote:
> >
> > In this case, data modelers hoping to save a column. arrggg.
> > It flies in the face of data normalization and pushes the
> problem down the
> > line.
>
> But you _cannot_ normalize people’s names; that’s the exact point
> of that article. Anything you assume about the structure of a name
> will be wrong in some culture.
>
> -Jens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Richard Damon
But the main point of the document is that just because you know how
things 'must' be where you are, doesn't mean that every name you need to
handle will be built on those same rules. I can think of cases which
shows problems with most of those rules. The key is that each of the
rules sounds like a rule that someone has assumed, and each of the rules
is something that the author of the article know of (or at least can
think of) a case where that rule doesn't hold. A rule that holds only
99.999% of the time is not always true.

On 11/9/19 9:44 PM, Doug wrote:

> Au Contraire, Jens! In many local contexts you can normalize people's names. I was born in Kansas, USA. My parents filled out a birth certificate for me. It had a place on the form for first name, middle name, last name, and a suffix like II or III.
>
> That birth certificate form determined that everyone born in Kansas (at that time), had a first, middle, and last name. There was no discussion of the matter. That's the way it was. The form led the way; people never thought about whether it was effective or not. Each newly-born child was given a first, middle, and last name.
>
> Effective was irrelevant for that system. There was no option, no alternative. It simply was.
>
> All systems are like that at each moment in time. They are what they are at any moment in time, and they force the users to behave the way the system wants them to behave. If you want to change the system and momentum is on your side, then immediately you have a new system - at that moment in time. It is composed of the old system and the momentum.
>
> Back to names: just like the birth certificate, a system which assigns a name to you, actually coerces you to have that name, because within that system, you exist as that name. The "names" article is totally wrong when it says that each assumption is wrong. Each of those assumptions is correct, and I can find at least one system which makes each one correct. Within each system, the assumption works, and is valid.
>
> My two cents...
> Doug
>
>> -----Original Message-----
>> From: sqlite-users <[hidden email]>
>> On Behalf Of Jens Alfke
>> Sent: Saturday, November 09, 2019 5:11 PM
>> To: SQLite mailing list <[hidden email]>
>> Subject: Re: [sqlite] Things you shouldn't assume when you store
>> names
>>
>> On Nov 9, 2019, at 1:09 PM, [hidden email] wrote:
>>> In this case, data modelers hoping to save a column. arrggg.
>>> It flies in the face of data normalization and pushes the
>> problem down the
>>> line.
>> But you _cannot_ normalize people’s names; that’s the exact point
>> of that article. Anything you assume about the structure of a name
>> will be wrong in some culture.
>>
>> -Jens
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>> users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


--
Richard Damon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Gary R. Schmidt
In reply to this post by Doug
On 10/11/2019 13:44, Doug wrote:

> Au Contraire, Jens! In many local contexts you can normalize people's names. I was born in Kansas, USA. My parents filled out a birth certificate for me. It had a place on the form for first name, middle name, last name, and a suffix like II or III.
>
> That birth certificate form determined that everyone born in Kansas (at that time), had a first, middle, and last name. There was no discussion of the matter. That's the way it was. The form led the way; people never thought about whether it was effective or not. Each newly-born child was given a first, middle, and last name.
>
> Effective was irrelevant for that system. There was no option, no alternative. It simply was.
>
> All systems are like that at each moment in time. They are what they are at any moment in time, and they force the users to behave the way the system wants them to behave. If you want to change the system and momentum is on your side, then immediately you have a new system - at that moment in time. It is composed of the old system and the momentum.
>
> Back to names: just like the birth certificate, a system which assigns a name to you, actually coerces you to have that name, because within that system, you exist as that name. The "names" article is totally wrong when it says that each assumption is wrong. Each of those assumptions is correct, and I can find at least one system which makes each one correct. Within each system, the assumption works, and is valid.
>
> My two cents...
Is not worth the paper it is written on!

So what happens when someone from a family who only uses first- and
last-names moves to Kansas?

Do they have to make up a middle-name so that he idiots can fill out the
forms?

Well, in the case of the US Navy back in the late 1980's, when a friend
of mine from here in Australia, who only has a first and last-name
married a USN pilot and moved to the USA, she was told that, "Yes, you
have a middle name."  No amount of arguing, or producing of official
documents, (well, it's the USA, most people there don't know what a
passport is), could prevail.  In the end she conceded defeat and became
<Jane> Doe <Smith>, for the duration.

Names are impossible, unless you use a free-form, infinite-length field,
you won't be safe, and even then, someone with turn up whose name is 'n'
recurring to an infinite number of characters or something!

        Cheers,
                Gary B-)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Simon Slavin-3
On 10 Nov 2019, at 6:21am, Gary R. Schmidt <[hidden email]> wrote:

> So what happens when someone from a family who only uses first- and last-names moves to Kansas?

In my time with databases, I encountered several USAsians with a middle name of 'Nmn'.  I know many USAsian people but nobody with this name.  It puzzled me until, months later, I was reading a database which preserved case, and encountered it as 'NMN'.  From this I figured out it signified "No Middle Name'.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Richard Damon
In reply to this post by Gary R. Schmidt
On 11/10/19 1:21 AM, Gary R. Schmidt wrote:

> On 10/11/2019 13:44, Doug wrote:
>> Au Contraire, Jens! In many local contexts you can normalize people's
>> names. I was born in Kansas, USA. My parents filled out a birth
>> certificate for me. It had a place on the form for first name, middle
>> name, last name, and a suffix like II or III.
>>
>> That birth certificate form determined that everyone born in Kansas
>> (at that time), had a first, middle, and last name. There was no
>> discussion of the matter. That's the way it was. The form led the
>> way; people never thought about whether it was effective or not. Each
>> newly-born child was given a first, middle, and last name.
>>
>> Effective was irrelevant for that system. There was no option, no
>> alternative. It simply was.
>>
>> All systems are like that at each moment in time. They are what they
>> are at any moment in time, and they force the users to behave the way
>> the system wants them to behave. If you want to change the system and
>> momentum is on your side, then immediately you have a new system - at
>> that moment in time. It is composed of the old system and the momentum.
>>
>> Back to names: just like the birth certificate, a system which
>> assigns a name to you, actually coerces you to have that name,
>> because within that system, you exist as that name. The "names"
>> article is totally wrong when it says that each assumption is wrong.
>> Each of those assumptions is correct, and I can find at least one
>> system which makes each one correct. Within each system, the
>> assumption works, and is valid.
>>
>> My two cents...
> Is not worth the paper it is written on!
>
> So what happens when someone from a family who only uses first- and
> last-names moves to Kansas?
>
> Do they have to make up a middle-name so that he idiots can fill out
> the forms?
>
> Well, in the case of the US Navy back in the late 1980's, when a
> friend of mine from here in Australia, who only has a first and
> last-name married a USN pilot and moved to the USA, she was told that,
> "Yes, you have a middle name."  No amount of arguing, or producing of
> official documents, (well, it's the USA, most people there don't know
> what a passport is), could prevail.  In the end she conceded defeat
> and became <Jane> Doe <Smith>, for the duration.
>
> Names are impossible, unless you use a free-form, infinite-length
> field, you won't be safe, and even then, someone with turn up whose
> name is 'n' recurring to an infinite number of characters or something!
>
>     Cheers,
>         Gary    B-)
Actually, 'The Artist whose name formerly was Prince' (which wasn't his
name, his legal name was an unpronounceable pictograph), breaks every
computer system I know.

--
Richard Damon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Kevin O'Gorman
And "full legal name"????   How about my dad, whose full name was Dr. John
Michael Patrick Dennis Emmet O'Gorman, PhD.  How many rules does that
break?  I've fought many companies over that apostrophe in my life.
Governments tend to throw it away, but it's on my old passport and birth
certificate.

---
Dictionary.com's word of the year: *misinformation*
Merriam-Webster word of the year: *justice*


On Sun, Nov 10, 2019 at 4:01 AM Richard Damon <[hidden email]>
wrote:

> On 11/10/19 1:21 AM, Gary R. Schmidt wrote:
> > On 10/11/2019 13:44, Doug wrote:
> >> Au Contraire, Jens! In many local contexts you can normalize people's
> >> names. I was born in Kansas, USA. My parents filled out a birth
> >> certificate for me. It had a place on the form for first name, middle
> >> name, last name, and a suffix like II or III.
> >>
> >> That birth certificate form determined that everyone born in Kansas
> >> (at that time), had a first, middle, and last name. There was no
> >> discussion of the matter. That's the way it was. The form led the
> >> way; people never thought about whether it was effective or not. Each
> >> newly-born child was given a first, middle, and last name.
> >>
> >> Effective was irrelevant for that system. There was no option, no
> >> alternative. It simply was.
> >>
> >> All systems are like that at each moment in time. They are what they
> >> are at any moment in time, and they force the users to behave the way
> >> the system wants them to behave. If you want to change the system and
> >> momentum is on your side, then immediately you have a new system - at
> >> that moment in time. It is composed of the old system and the momentum.
> >>
> >> Back to names: just like the birth certificate, a system which
> >> assigns a name to you, actually coerces you to have that name,
> >> because within that system, you exist as that name. The "names"
> >> article is totally wrong when it says that each assumption is wrong.
> >> Each of those assumptions is correct, and I can find at least one
> >> system which makes each one correct. Within each system, the
> >> assumption works, and is valid.
> >>
> >> My two cents...
> > Is not worth the paper it is written on!
> >
> > So what happens when someone from a family who only uses first- and
> > last-names moves to Kansas?
> >
> > Do they have to make up a middle-name so that he idiots can fill out
> > the forms?
> >
> > Well, in the case of the US Navy back in the late 1980's, when a
> > friend of mine from here in Australia, who only has a first and
> > last-name married a USN pilot and moved to the USA, she was told that,
> > "Yes, you have a middle name."  No amount of arguing, or producing of
> > official documents, (well, it's the USA, most people there don't know
> > what a passport is), could prevail.  In the end she conceded defeat
> > and became <Jane> Doe <Smith>, for the duration.
> >
> > Names are impossible, unless you use a free-form, infinite-length
> > field, you won't be safe, and even then, someone with turn up whose
> > name is 'n' recurring to an infinite number of characters or something!
> >
> >     Cheers,
> >         Gary    B-)
> Actually, 'The Artist whose name formerly was Prince' (which wasn't his
> name, his legal name was an unpronounceable pictograph), breaks every
> computer system I know.
>
> --
> Richard Damon
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Jens Alfke-2
In reply to this post by Richard Damon

> On Nov 10, 2019, at 4:03 AM, Richard Damon <[hidden email]> wrote:
>
> Actually, 'The Artist whose name formerly was Prince' (which wasn't his
> name, his legal name was an unpronounceable pictograph), breaks every
> computer system I know.

Unicode Character PRINCE (U+1F934)
https://www.fileformat.info/info/unicode/char/1f934/index.htm

Oh wait, wrong Prince...

There’s always this:
https://parkerhiggins.net/2013/01/writing-the-prince-symbol-in-unicode/

But all kidding aside, databases are created to serve people, not the other way around. To declare that a database schema is the truth is absurd.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Jose Isaias Cabrera-4
In reply to this post by Richard Damon

Richard Damon, on Sunday, November 10, 2019 07:01 AM, wrote...

> Actually, 'The Artist whose name formerly was Prince' (which wasn't his
> name, his legal name was an unpronounceable pictograph), breaks every
> computer system I know.

Not if the system uses UTF32. :-) You could put the pictograph in that that textbox, and it'll work.

josé
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Simon Slavin-3
On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera <[hidden email]> wrote:

> Not if the system uses UTF32. :-) You could put the pictograph in that that textbox, and it'll work.

Can you point to some description of this and how it works ?  I've never heard of it.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Jose Isaias Cabrera-4

Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>
> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>
> > Not if the system uses UTF32. :-) You could put the pictograph in that that textbox, and it'll work.
>
> Can you point to some description of this and how it works ?  I've never heard of it.

My point was that one could define the UTF32 [1] code for that specific pictograph or glyph, and it'll work.

josé

[1] https://en.wikipedia.org/wiki/UTF-32
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Richard Damon
On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:

> Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>>
>>> Not if the system uses UTF32. :-) You could put the pictograph in that that textbox, and it'll work.
>> Can you point to some description of this and how it works ?  I've never heard of it.
> My point was that one could define the UTF32 [1] code for that specific pictograph or glyph, and it'll work.
>
> josé
>
> [1] https://en.wikipedia.org/wiki/UTF-32

UTF-32 gives no encoding advantage over other Unicode formats, as all
allow expressing all the Unicode code points.

There is no code-point assigned to the Pictogram for his name (As far as
I know), so their is no value you can put in represent it.

There are a number of code points reserved for user definition, but many
of those have been informally reserved for characters no yet put into
Unicode.

It would be possible to include in the application some way to add user
defined glyphs to the system fonts for user defined code points, and
then reconcile these when transferring data from one system to another.

Another option would be to define some user defined code point pair as a
graphics escape, and put within it an encoding of a graphics file
containing the glyph, but at that point you are really outside of being
'Unicode'

--
Richard Damon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Jose Isaias Cabrera-4


Richard Damon, on Monday, November 11, 2019 09:47 AM, wrote...

>
> On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
> > Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
> >> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
> >>
> >>> Not if the system uses UTF32. :-) You could put the pictograph in that that textbox, and it'll work.
> >> Can you point to some description of this and how it works ?  I've never heard of it.
> > My point was that one could define the UTF32 [1] code for that specific pictograph or glyph, and it'll work.
> >
> > josé
> >
> > [1] https://en.wikipedia.org/wiki/UTF-32
>
> UTF-32 gives no encoding advantage over other Unicode formats, as all
> allow expressing all the Unicode code points.

I disagree.  I believe that the future is UTF32.  I will give you that it's bulky, for example, here is the letter a written to a file in Windows-1252, UTF8 signed, UTF16be signed, a UTF32be signed:

bytes filename
1     0_Windows-1252.txt
4     1_UTF8signed.txt
4     2_UTF16BEsigned.txt
8     3_UTF32signed.txt

So, yes, it's bulky, but, if you want to count characters in languages such as Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that string to UTF32, and do a string count of that UTF32 variable.  Most people have to figure out what Unicode they are using, count the bytes, divide by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, convert it to UTF32, and do a count.

> There is no code-point assigned to the Pictogram for his name (As far as
> I know), so their is no value you can put in represent it.

You're right, but not that many people are changing their name to an image.  However, if two or three or more folks want to, there are enough empty UTF32 characters, that it can be accomplished.


> It would be possible to include in the application some way to add user
> defined glyphs to the system fonts for user defined code points, and
> then reconcile these when transferring data from one system to another.

We have done this for special customer requirements and have assigned our own UTF32 characters an specific design with our software.  But, yes, it's only our software, but what if... a reconciliation can happen?

josé
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Igor Tandetnik-2
On 11/11/2019 10:49 AM, Jose Isaias Cabrera wrote:
> So, yes, it's bulky, but, if you want to count characters in languages such as Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that string to UTF32, and do a string count of that UTF32 variable.

Between ligatures and combining diacritics, the number of Unicode codepoints in a string has little practical meaning. E.g. it is not necessarily correlated with the width of the string as displayed on the screen or on paper; or with the number of graphemes a human would say the string contains, if asked.

> Most people have to figure out what Unicode they are using, count the bytes, divide by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, convert it to UTF32, and do a count.

And then what do you do with that count? What do you use it for?
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Richard Damon
In reply to this post by Jose Isaias Cabrera-4
On 11/11/19 10:49 AM, Jose Isaias Cabrera wrote:

>
> Richard Damon, on Monday, November 11, 2019 09:47 AM, wrote...
>> On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
>>> Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>>>> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>>>>
>>>>> Not if the system uses UTF32. :-) You could put the pictograph in that that textbox, and it'll work.
>>>> Can you point to some description of this and how it works ?  I've never heard of it.
>>> My point was that one could define the UTF32 [1] code for that specific pictograph or glyph, and it'll work.
>>>
>>> josé
>>>
>>> [1] https://en.wikipedia.org/wiki/UTF-32
>> UTF-32 gives no encoding advantage over other Unicode formats, as all
>> allow expressing all the Unicode code points.
> I disagree.  I believe that the future is UTF32.  I will give you that it's bulky, for example, here is the letter a written to a file in Windows-1252, UTF8 signed, UTF16be signed, a UTF32be signed:
>
> bytes filename
> 1     0_Windows-1252.txt
> 4     1_UTF8signed.txt
> 4     2_UTF16BEsigned.txt
> 8     3_UTF32signed.txt
>
> So, yes, it's bulky, but, if you want to count characters in languages such as Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that string to UTF32, and do a string count of that UTF32 variable.  Most people have to figure out what Unicode they are using, count the bytes, divide by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, convert it to UTF32, and do a count.
UTF-32 is a reasonable internal operation format, if code-point
operations are important. It does not make a good transmission format,
as it is usually takes more media than UTF-8 or UTF-16, and for
transmission, the message size is important. The big issue is that
code-point counting is rarely what you want, you generally want Glyph
counting, which even UTF-32 doesn't provide.
>
>> There is no code-point assigned to the Pictogram for his name (As far as
>> I know), so their is no value you can put in represent it.
> You're right, but not that many people are changing their name to an image.  However, if two or three or more folks want to, there are enough empty UTF32 characters, that it can be accomplished.
But this shows that 'Unicode' doesn't handle the name, as is, which was
the point of the rule, if you design you software just assuming that
Unicode can handle all names, you will be very occasionally be wrong.
There are actually many more cases of this, I imagine a lot of
aboriginal people who have their own writing systems that haven't been
adopted by Unicode, have names (as their preferred name) that can't be
expressed in official Unicode. They may have a Government assigned
'official' name (if they have had to interact with the Government) that
can be represented, but that really isn't their name (Prince just had
the resources and gall to do it 'officially').

>
>
>> It would be possible to include in the application some way to add user
>> defined glyphs to the system fonts for user defined code points, and
>> then reconcile these when transferring data from one system to another.
> We have done this for special customer requirements and have assigned our own UTF32 characters an specific design with our software.  But, yes, it's only our software, but what if... a reconciliation can happen?
>
> josé
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


--
Richard Damon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Simon Slavin-3
In reply to this post by Igor Tandetnik-2
On 11 Nov 2019, at 4:02pm, Igor Tandetnik <[hidden email]> wrote:

> And then what do you do with that count? What do you use it for?

This is a key point.  When I started programming I used to do LEFT(A$(I), 14) frequently.  But almost all of them were because I wanted to print the string and had allocated 14 characters of space to in.

Then came variable-width fonts.  The practise should have died out.  But people are still doing it.

There are other reasons to get the beginning of a string.  The first character alone, especially.  There may be other reasons to get its length.  But it was mostly done because the length of the string was its width on the display.  And it isn't any more.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Things you shouldn't assume when you store names

Jens Alfke-2
In reply to this post by Jose Isaias Cabrera-4

> On Nov 11, 2019, at 7:49 AM, Jose Isaias Cabrera <[hidden email]> wrote:
>
> if you want to count characters in languages such as Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that string to UTF32, and do a string count of that UTF32 variable.

No, the easiest way is to ask your string class/library what the character count is, and let _it_ deal with the fiddly details.

Or to consider why you need the character count in the first place — it’s usually not something that’s useful to know. Usually what you’re really asking is “how many pixels wide will this render?” or “how many bytes will this occupy?” or even “let me iterate over each character”.

At a low level, UTF-8 makes a lot more sense. It’s very compact, which is important for cache coherency as well as storage space. It’s upward compatible with ASCII, which is extremely convenient for text-based protocols / file formats / languages, and for working with legacy APIs (like <string.h>!)

Modern libraries seem to be moving to UTF-8. For instance, Apple’s been migrating Swift’s string class from a legacy UTF-16 encoding to UTF-8, and playing up the consequent performance and space win. Go has been UTF-8 from the start. I don’t know of a single library that’s gone with UTF-32, except maybe as an option.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
1234