SQLite character comparisons

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

SQLite character comparisons

Fowler, Jeff
Hello All,
 
I've used SQL Server for over 15 years, Oracle off & on when I have no
choice, but SQLite for a couple weeks. I've just learned (today) that
SQLite respects trailing spaces when comparing two character fields.
I.e. 'SQLITE' <> 'SQLITE '
 
Is this behavior intentional? Neither SQL Server nor Oracle do this.
Just curious as to why it works this way.
 
Thanks,
 
- Jeff
 
Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Zbigniew Baniewski
On Thu, Jan 17, 2008 at 10:43:20AM -0500, Fowler, Jeff wrote:

> I've used SQL Server for over 15 years, Oracle off & on when I have no
> choice, but SQLite for a couple weeks. I've just learned (today) that
> SQLite respects trailing spaces when comparing two character fields.
> I.e. 'SQLITE' <> 'SQLITE '
>  
> Is this behavior intentional? Neither SQL Server nor Oracle do this.
> Just curious as to why it works this way.

PostgreSQL sees the strings that way too:

mydbase=> select 'str' = 'str';
 ?column?
----------
 t
(1 row)

mydbase=> select 'str' = 'str ';
 ?column?
----------
 f
(1 row)


Those are different strings, anyway. Can't recall now, does there any
setting to change this - but you can easily find out at their website.
--
                                pozdrawiam / regards

                                                Zbigniew Baniewski

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Fowler, Jeff
By the way.. I found this snippet. If I read it right, it seems that
IGNORING trailing spaces during string comparisons is ANSI standard.

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2,
<Comparison Predicate>, General rules #3) on how to compare strings with
spaces. The ANSI standard requires padding for the character strings
used in comparisons so that their lengths match before comparing them.
The padding directly affects the semantics of WHERE and HAVING clause
predicates and other Transact-SQL string comparisons. For example,
Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for
most comparison operations.

-----Original Message-----
From: Zbigniew Baniewski [mailto:[hidden email]]
Sent: Thursday, January 17, 2008 5:20 PM
To: [hidden email]
Subject: Re: [sqlite] SQLite character comparisons

On Thu, Jan 17, 2008 at 10:43:20AM -0500, Fowler, Jeff wrote:

> I've used SQL Server for over 15 years, Oracle off & on when I have no

> choice, but SQLite for a couple weeks. I've just learned (today) that
> SQLite respects trailing spaces when comparing two character fields.
> I.e. 'SQLITE' <> 'SQLITE '
>  
> Is this behavior intentional? Neither SQL Server nor Oracle do this.
> Just curious as to why it works this way.

PostgreSQL sees the strings that way too:

mydbase=> select 'str' = 'str';
 ?column?
----------
 t
(1 row)

mydbase=> select 'str' = 'str ';
 ?column?
----------
 f
(1 row)


Those are different strings, anyway. Can't recall now, does there any
setting to change this - but you can easily find out at their website.
--
                                pozdrawiam / regards

                                                Zbigniew Baniewski

------------------------------------------------------------------------
-----
To unsubscribe, send email to [hidden email]
------------------------------------------------------------------------
-----


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Zbigniew Baniewski
On Thu, Jan 17, 2008 at 05:25:30PM -0500, Fowler, Jeff wrote:

> By the way.. I found this snippet. If I read it right, it seems that
> IGNORING trailing spaces during string comparisons is ANSI standard.

I'm not sure. I was always avoiding such problem by "trim"-ming everything
to be inserted; either before, or during insertion (directly in SQL query).

There is remark in Postgres docs, that it does follow SQL-92 - so probably
the above is a recommendation rather than a rule.

In the Postgres docs I've found:

#v+
  The notations varchar(n) and char(n) are aliases for character varying(n)
  and character(n) [..]
  Values of type character are physically padded with spaces to the specified
  width n, and are stored and displayed that way. However, the padding spaces
  are treated as semantically insignificant. Trailing spaces are disregarded
  when comparing two values of type character, and they will be removed when
  converting a character value to one of the other string types. Note that
  trailing spaces are semantically significant in character varying and text
  values.
#v-

So, you can just use "character" type, to have what you need. But I'm still
talking about Postgres ;) - and you were asking about SQLite.

I'm trimming it all anyway... ;)
--
                                pozdrawiam / regards

                                                Zbigniew Baniewski

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Darren Duncan
In reply to this post by Fowler, Jeff
At 10:43 AM -0500 1/17/08, Fowler, Jeff wrote:
>Hello All,
>I've used SQL Server for over 15 years, Oracle off & on when I have no
>choice, but SQLite for a couple weeks. I've just learned (today) that
>SQLite respects trailing spaces when comparing two character fields.
>I.e. 'SQLITE' <> 'SQLITE '
>Is this behavior intentional? Neither SQL Server nor Oracle do this.
>Just curious as to why it works this way.

Because respecting the actual contents of the string is the better
way to do things.

The strings 'SQLITE' and 'SQLITE ' are not the same string.  Just as
the strings 'SQLITE' and 'sqlite' are not the same string.  A
computer language is more logical, predictable, and easy to use when
a test for equality or inequality actually treats every distinct
value as distinct.  If trailing spaces were supposed to be
insignificant for an equality test, then it should not be possible to
define a string value containing trailing spaces at all.

Moreover, treating 'SQLITE' and 'SQLITE ' as not being the same
string also is consistent with the SQLite philosophy, because it
means SQLite has fewer exceptions to be concerned with in a simpler
set of rules, and also not having to check lengths and space pad
before each compare also makes the code simpler, and less buggy, and
it saves CPU cycles.  A value equality test is a very common and
fundamental thing to do in a DBMS, and bloating that will have a big
performance impact.

-- Darren Duncan

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Fowler, Jeff
"Better" depends on who you ask - I'd say it's worse, and I bet most
DBA's would agree. The ANSI standard is to ignore trailing spaces when
comparing character strings in a WHERE clause, a HAVING clause, or a
join. So I can take the exact same data, run the exact same query, yet
get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
found this issue because we DID get a different answer.

Regarding whether by extension it should be impossible to create strings
with trailing spaces; I side with the SQLite developers who say it isn't
the engine's job to trim blanks in data. Most other engines I've used do
not trim spaces either, even if the field is a varchar.

But - whether ANSI compliance is considered "bloat" is not really my
place to comment upon. I guess it's really is up to the SQLite team.
Purely from a business usability standpoint (not a programming one), I
would say there's no question that it's far more useful to do
comparisons the ANSI way. If for some reason I truly want to compare &
respect trailing spaces, I can still do that using a function such as
HEX(A) = HEX(B) or something better.

- Jeff Fowler

 

-----Original Message-----
From: Darren Duncan [mailto:[hidden email]]
Sent: Friday, January 18, 2008 3:33 PM
To: [hidden email]
Subject: Re: [sqlite] SQLite character comparisons

At 10:43 AM -0500 1/17/08, Fowler, Jeff wrote:
>Hello All,
>I've used SQL Server for over 15 years, Oracle off & on when I have no
>choice, but SQLite for a couple weeks. I've just learned (today) that
>SQLite respects trailing spaces when comparing two character fields.
>I.e. 'SQLITE' <> 'SQLITE '
>Is this behavior intentional? Neither SQL Server nor Oracle do this.
>Just curious as to why it works this way.

Because respecting the actual contents of the string is the better way
to do things.

The strings 'SQLITE' and 'SQLITE ' are not the same string.  Just as the
strings 'SQLITE' and 'sqlite' are not the same string.  A computer
language is more logical, predictable, and easy to use when a test for
equality or inequality actually treats every distinct value as distinct.
If trailing spaces were supposed to be insignificant for an equality
test, then it should not be possible to define a string value containing
trailing spaces at all.

Moreover, treating 'SQLITE' and 'SQLITE ' as not being the same string
also is consistent with the SQLite philosophy, because it means SQLite
has fewer exceptions to be concerned with in a simpler set of rules, and
also not having to check lengths and space pad before each compare also
makes the code simpler, and less buggy, and it saves CPU cycles.  A
value equality test is a very common and fundamental thing to do in a
DBMS, and bloating that will have a big performance impact.

-- Darren Duncan

------------------------------------------------------------------------
-----
To unsubscribe, send email to [hidden email]
------------------------------------------------------------------------
-----


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Darren Duncan
At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote:
>"Better" depends on who you ask - I'd say it's worse, and I bet most
>DBA's would agree. The ANSI standard is to ignore trailing spaces when
>comparing character strings in a WHERE clause, a HAVING clause, or a
>join. So I can take the exact same data, run the exact same query, yet
>get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
>found this issue because we DID get a different answer.

And every SQL implementation already has many differences from every
other one, even with fundamentals, making them all generally
incompatible and proprietary, and this example is right in line with
the kinds of differences they have.  Other implementations have
distinct var-char and fixed-char types, while SQLite only has the
former.

Part of the problem here is that the ANSI standard is itself flawed.

(Maybe this particular case of trailing spaces is an exception, but
many parts of ANSI SQL, AFAIK, are more focused on what the syntax of
SQL is, and they say that the semantics, even of some fundamental
operations, is left up to the implementation to decide for itself.
So what good does it do you if SQL of the same syntax will compile on
different DBMSs if it behaves differently in each one?  And the
standard considers this valid.)

How many other programming langauges besides ANSI SQL treat trailing
spaces as insignificant.

>Regarding whether by extension it should be impossible to create strings
>with trailing spaces; I side with the SQLite developers who say it isn't
>the engine's job to trim blanks in data. Most other engines I've used do
>not trim spaces either, even if the field is a varchar.

And rightly so, you should not trim spaces, because spaces are significant.

Herein lies a wider part of the problem.  The ANSI SQL is
inconsistent in how it treats trailing spaces in strings.  On one
hand it wants to preserve them, but on the other hand it wants to
ignore them in its most fundamental operation other than preserving.

(With my "it should not be possible" sentence, I was not saying that
spaces should not be trimmed in the fictional scenario where a
character string does by definition not contain trailing spaces, but
that code specifying them should produce an error rather than
succeed.  The matter is analagous to what would happen if you write
code that tries to treat the character string literal 'foo' as a
number.)

>But - whether ANSI compliance is considered "bloat" is not really my
>place to comment upon.  I guess it's really is up to the SQLite team.

Yes it is up to the developers.  And they have already demonstrated
willingness to do some things differently than ANSI SQL because they
considered the differences to be improvements, or alternately
reasonable feature cutting.

>Purely from a business usability standpoint (not a programming one), I
>would say there's no question that it's far more useful to do
>comparisons the ANSI way.

And why is it more useful to ignore trailing spaces than respect
them.  And if ignoring them is more useful, why do most programming
languages (AFAIK) respect them?

>  If for some reason I truly want to compare &
>respect trailing spaces, I can still do that using a function such as
>HEX(A) = HEX(B) or something better.

I would argue that it is rediculous to do such ugly things in order
to do something that should be fundamental, and is simple and
fundamental in any other language.  Better for basic '=' comparison
to test that the values are the same, and have some other operator or
function like 'equal_when_trimmed( v1, v2 )' when you want various
exceptional comparisons.

-- Darren Duncan

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Zbigniew Baniewski
In reply to this post by Darren Duncan
On Fri, Jan 18, 2008 at 12:32:36PM -0800, Darren Duncan wrote:

> If trailing spaces were supposed to be insignificant for an equality test,
> then it should not be possible to define a string value containing
> trailing spaces at all.

Yes, yes: quite right... the above reminds me something... ;)
--
                                pozdrawiam / regards

                                                Zbigniew Baniewski

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Darren Duncan
At 10:57 PM +0100 1/18/08, Zbigniew Baniewski wrote:
>On Fri, Jan 18, 2008 at 12:32:36PM -0800, Darren Duncan wrote:
>  > If trailing spaces were supposed to be insignificant for an equality test,
>>  then it should not be possible to define a string value containing
>>  trailing spaces at all.
>
>Yes, yes: quite right... the above reminds me something... ;)

Don't read too much into that statement; I'm was not raising the
auto-trim thing.  See my previous email in this thread, which replied
to Jeff Fowler, which addresses his own comment on my statement. --
Darren Duncan

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Zbigniew Baniewski
On Fri, Jan 18, 2008 at 02:13:51PM -0800, Darren Duncan wrote:

> Don't read too much into that statement; I'm was not raising the
> auto-trim thing.

Yes, yes - I know... "Roma locuta"...
--
                                pozdrawiam / regards

                                                Zbigniew Baniewski

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Evans, Mark (Tandem)
Think of NULL as "value is unknown".

With zero length blob/text, value is known:  0-length blob/text

Cheers,
Mark

> -----Original Message-----
> From: Zbigniew Baniewski [mailto:[hidden email]]
> Sent: Friday, January 18, 2008 4:51 PM
> To: [hidden email]
> Subject: Re: [sqlite] SQLite character comparisons
>
> On Fri, Jan 18, 2008 at 02:13:51PM -0800, Darren Duncan wrote:
>
> > Don't read too much into that statement; I'm was not raising the
> > auto-trim thing.
>
> Yes, yes - I know... "Roma locuta"...
> --
>                                 pozdrawiam / regards
>
>                                                 Zbigniew Baniewski
>
> --------------------------------------------------------------
> ---------------
> To unsubscribe, send email to [hidden email]
> --------------------------------------------------------------
> ---------------
>
>

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Zbigniew Baniewski
On Fri, Jan 18, 2008 at 11:04:09PM +0000, Evans, Mark (Tandem) wrote:

> Think of NULL as "value is unknown".
>
> With zero length blob/text, value is known:  0-length blob/text

OK, I will :)
--
                                pozdrawiam / regards

                                                Zbigniew Baniewski

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Evans, Mark (Tandem)
Doh!  Wrong thread.  :) :) :)

> -----Original Message-----
> From: Zbigniew Baniewski [mailto:[hidden email]]
> Sent: Friday, January 18, 2008 5:14 PM
> To: [hidden email]
> Subject: Re: [sqlite] SQLite character comparisons
>
> On Fri, Jan 18, 2008 at 11:04:09PM +0000, Evans, Mark (Tandem) wrote:
>
> > Think of NULL as "value is unknown".
> >
> > With zero length blob/text, value is known:  0-length blob/text
>
> OK, I will :)
> --
>                                 pozdrawiam / regards
>
>                                                 Zbigniew Baniewski
>
> --------------------------------------------------------------
> ---------------
> To unsubscribe, send email to [hidden email]
> --------------------------------------------------------------
> ---------------
>
>

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Fowler, Jeff
In reply to this post by Darren Duncan
Hello All,
 
Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them."
 
Does anyone know of another RDBMS (ANSI or no) that respects trailing spaces when comparing character data? We have both Oracle 10g and SQL Server 2005 in house and they both work "correctly" according to the specification. Has anyone tried it with DB2 or Informix? What about PostGres and MySQL? Although I haven't asked him, I'm guessing Zbigniew's suggestion a while back for an auto-trim feature stemmed from this issue. Other than saving space, would there be a need to trim data if WHERE, HAVING clauses and joins followed the spec? Also, other than performance (which seems to be the primary concern), would anyone would be negatively impacted if the current behavior were changed?
 
Our app creates SQLite tables dynamically based on the output from user-defined queries that run against data warehouses (of practically any "flavor") we have no control over, and we insert the results into SQLite. Sure - we can handle this situation by writing more code looking for spaces everywhere they might occur. But to me (and maybe only to me?), it makes sense for SQLite -- where reasonably possible -- to attempt to follow clear ANSI guidelines, allowing developers to override it only in cases where this adherence produces undesirable results. I can't see where this is undesirable from an applications standpoint, although I can see where performance may be a concern.
 
Regards,
 
Jeff Fowler
 
________________________________

From: Darren Duncan [mailto:[hidden email]]
Sent: Fri 1/18/2008 4:47 PM
To: [hidden email]
Subject: RE: [sqlite] SQLite character comparisons



At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote:
>"Better" depends on who you ask - I'd say it's worse, and I bet most
>DBA's would agree. The ANSI standard is to ignore trailing spaces when
>comparing character strings in a WHERE clause, a HAVING clause, or a
>join. So I can take the exact same data, run the exact same query, yet
>get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
>found this issue because we DID get a different answer.

And every SQL implementation already has many differences from every
other one, even with fundamentals, making them all generally
incompatible and proprietary, and this example is right in line with
the kinds of differences they have.  Other implementations have
distinct var-char and fixed-char types, while SQLite only has the
former.

Part of the problem here is that the ANSI standard is itself flawed.

(Maybe this particular case of trailing spaces is an exception, but
many parts of ANSI SQL, AFAIK, are more focused on what the syntax of
SQL is, and they say that the semantics, even of some fundamental
operations, is left up to the implementation to decide for itself.
So what good does it do you if SQL of the same syntax will compile on
different DBMSs if it behaves differently in each one?  And the
standard considers this valid.)

How many other programming langauges besides ANSI SQL treat trailing
spaces as insignificant.

>Regarding whether by extension it should be impossible to create strings
>with trailing spaces; I side with the SQLite developers who say it isn't
>the engine's job to trim blanks in data. Most other engines I've used do
>not trim spaces either, even if the field is a varchar.

And rightly so, you should not trim spaces, because spaces are significant.

Herein lies a wider part of the problem.  The ANSI SQL is
inconsistent in how it treats trailing spaces in strings.  On one
hand it wants to preserve them, but on the other hand it wants to
ignore them in its most fundamental operation other than preserving.

(With my "it should not be possible" sentence, I was not saying that
spaces should not be trimmed in the fictional scenario where a
character string does by definition not contain trailing spaces, but
that code specifying them should produce an error rather than
succeed.  The matter is analagous to what would happen if you write
code that tries to treat the character string literal 'foo' as a
number.)

>But - whether ANSI compliance is considered "bloat" is not really my
>place to comment upon.  I guess it's really is up to the SQLite team.

Yes it is up to the developers.  And they have already demonstrated
willingness to do some things differently than ANSI SQL because they
considered the differences to be improvements, or alternately
reasonable feature cutting.

>Purely from a business usability standpoint (not a programming one), I
>would say there's no question that it's far more useful to do
>comparisons the ANSI way.

And why is it more useful to ignore trailing spaces than respect
them.  And if ignoring them is more useful, why do most programming
languages (AFAIK) respect them?

>  If for some reason I truly want to compare &
>respect trailing spaces, I can still do that using a function such as
>HEX(A) = HEX(B) or something better.

I would argue that it is rediculous to do such ugly things in order
to do something that should be fundamental, and is simple and
fundamental in any other language.  Better for basic '=' comparison
to test that the values are the same, and have some other operator or
function like 'equal_when_trimmed( v1, v2 )' when you want various
exceptional comparisons.

-- Darren Duncan

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------



Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Doug Currie-2
On Sunday, January 20, 2008 Fowler, Jeff wrote:

> briefly, ANSI SQL-92 specifies that when comparing two character
> fields, trailing spaces should be ignored.

From SQL-92 (draft July 1992) section 4.6

> When values of unequal length are compared, if the collating
> sequence for the comparison has the NO PAD attribute and the shorter
> value is equal to a prefix of the longer value, then the shorter
> value is considered less than the longer value. If the collating
> sequence for the comparison has the PAD SPACE attribute, for the
> purposes of the comparison, the shorter value is effectively
> extended to the length of the longer by concatenation of <space>s on
> the right.

section 8.2 also says

> 3) The comparison of two character strings is determined as fol-
>    lows:
>
>    a) If the length in characters of X is not equal to the length
>      in characters of Y, then the shorter string is effectively
>      replaced, for the purposes of comparison, with a copy of
>      itself that has been extended to the length of the longer
>      string by concatenation on the right of one or more pad char-
>      acters, where the pad character is chosen based on CS. If
>      CS has the NO PAD attribute, then the pad character is an
>      implementation-dependent character different from any char-
>      acter in the character set of X and Y that collates less
>      than any string under CS. Otherwise, the pad character is a
>      <space>.

So, using this terminology, the SQLite default collating sequence has
the NO PAD attribute, and the pad character is NUL.

Jeff, can you solve your problem with a custom collating sequence?

e

--
Doug Currie
Londonderry, NH, USA


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

Jay Sprenkle
In reply to this post by Fowler, Jeff
On Jan 20, 2008 10:19 AM, Fowler, Jeff <[hidden email]> wrote:
>
> Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue.

Sqlite is DRH's project. Voting doesn't come into the picture
anywhere. If you convince him
to fix it then he might, if not you're wasting your time asking if
people agree with you.

>Sure - we can handle this situation by writing more code looking for
spaces everywhere they might occur.

The source for the database engine is provided. It would be more
efficient to change it there.
You could also provide the code back to the community as an option for
others to use.

I'm not trying to be unpleasant, I just believe where you're headed
isn't going to be as valuable as other options.

--
The PixAddixImage Collector suite:
http://groups-beta.google.com/group/pixaddix

SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

John Stanton-3
In reply to this post by Fowler, Jeff
I would suggest that Sqlite is correct having regard to its universal
variable length feature and single TEXT type.  Compatibility would be
achieved by not storing redundant trailing spaces if a string field is
required.  More efficient storage would be a bonus.

Fowler, Jeff wrote:

> Hello All,
>  
> Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them."
>  
> Does anyone know of another RDBMS (ANSI or no) that respects trailing spaces when comparing character data? We have both Oracle 10g and SQL Server 2005 in house and they both work "correctly" according to the specification. Has anyone tried it with DB2 or Informix? What about PostGres and MySQL? Although I haven't asked him, I'm guessing Zbigniew's suggestion a while back for an auto-trim feature stemmed from this issue. Other than saving space, would there be a need to trim data if WHERE, HAVING clauses and joins followed the spec? Also, other than performance (which seems to be the primary concern), would anyone would be negatively impacted if the current behavior were changed?
>  
> Our app creates SQLite tables dynamically based on the output from user-defined queries that run against data warehouses (of practically any "flavor") we have no control over, and we insert the results into SQLite. Sure - we can handle this situation by writing more code looking for spaces everywhere they might occur. But to me (and maybe only to me?), it makes sense for SQLite -- where reasonably possible -- to attempt to follow clear ANSI guidelines, allowing developers to override it only in cases where this adherence produces undesirable results. I can't see where this is undesirable from an applications standpoint, although I can see where performance may be a concern.
>  
> Regards,
>  
> Jeff Fowler
>  
> ________________________________
>
> From: Darren Duncan [mailto:[hidden email]]
> Sent: Fri 1/18/2008 4:47 PM
> To: [hidden email]
> Subject: RE: [sqlite] SQLite character comparisons
>
>
>
> At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote:
>> "Better" depends on who you ask - I'd say it's worse, and I bet most
>> DBA's would agree. The ANSI standard is to ignore trailing spaces when
>> comparing character strings in a WHERE clause, a HAVING clause, or a
>> join. So I can take the exact same data, run the exact same query, yet
>> get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
>> found this issue because we DID get a different answer.
>
> And every SQL implementation already has many differences from every
> other one, even with fundamentals, making them all generally
> incompatible and proprietary, and this example is right in line with
> the kinds of differences they have.  Other implementations have
> distinct var-char and fixed-char types, while SQLite only has the
> former.
>
> Part of the problem here is that the ANSI standard is itself flawed.
>
> (Maybe this particular case of trailing spaces is an exception, but
> many parts of ANSI SQL, AFAIK, are more focused on what the syntax of
> SQL is, and they say that the semantics, even of some fundamental
> operations, is left up to the implementation to decide for itself.
> So what good does it do you if SQL of the same syntax will compile on
> different DBMSs if it behaves differently in each one?  And the
> standard considers this valid.)
>
> How many other programming langauges besides ANSI SQL treat trailing
> spaces as insignificant.
>
>> Regarding whether by extension it should be impossible to create strings
>> with trailing spaces; I side with the SQLite developers who say it isn't
>> the engine's job to trim blanks in data. Most other engines I've used do
>> not trim spaces either, even if the field is a varchar.
>
> And rightly so, you should not trim spaces, because spaces are significant.
>
> Herein lies a wider part of the problem.  The ANSI SQL is
> inconsistent in how it treats trailing spaces in strings.  On one
> hand it wants to preserve them, but on the other hand it wants to
> ignore them in its most fundamental operation other than preserving.
>
> (With my "it should not be possible" sentence, I was not saying that
> spaces should not be trimmed in the fictional scenario where a
> character string does by definition not contain trailing spaces, but
> that code specifying them should produce an error rather than
> succeed.  The matter is analagous to what would happen if you write
> code that tries to treat the character string literal 'foo' as a
> number.)
>
>> But - whether ANSI compliance is considered "bloat" is not really my
>> place to comment upon.  I guess it's really is up to the SQLite team.
>
> Yes it is up to the developers.  And they have already demonstrated
> willingness to do some things differently than ANSI SQL because they
> considered the differences to be improvements, or alternately
> reasonable feature cutting.
>
>> Purely from a business usability standpoint (not a programming one), I
>> would say there's no question that it's far more useful to do
>> comparisons the ANSI way.
>
> And why is it more useful to ignore trailing spaces than respect
> them.  And if ignoring them is more useful, why do most programming
> languages (AFAIK) respect them?
>
>>  If for some reason I truly want to compare &
>> respect trailing spaces, I can still do that using a function such as
>> HEX(A) = HEX(B) or something better.
>
> I would argue that it is rediculous to do such ugly things in order
> to do something that should be fundamental, and is simple and
> fundamental in any other language.  Better for basic '=' comparison
> to test that the values are the same, and have some other operator or
> function like 'equal_when_trimmed( v1, v2 )' when you want various
> exceptional comparisons.
>
> -- Darren Duncan
>
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [hidden email]
> -----------------------------------------------------------------------------
>
>
>
>


-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: SQLite character comparisons

D. Richard Hipp
In reply to this post by Fowler, Jeff
"Fowler, Jeff" <[hidden email]> wrote:
> I'm curious to know how many of you agree with Darren's
> sentiments on this issue....

Changing the behavior of SQLite to ignore trailing
spaces is not an option for SQLite version 3, since
to do so would result in a incompatible file format
All indices created before the change would be invalid
since they would use a different collation.  There
are multiple thousands of SQLite applications and
hundreds of millions of existing SQLite database files
that depend on this backwards compatibility.  To make
this change would therefore require bumping the version
number up to SQLite 4.0.

>  
> Our app creates SQLite tables dynamically based on the output
> from user-defined queries that run against data warehouses
> (of practically any "flavor") we have no control over, and
> we insert the results into SQLite. Sure - we can handle this
> situation by writing more code looking for spaces everywhere
> they might occur. But to me (and maybe only to me?), it makes
> sense for SQLite -- where reasonably possible -- to attempt
> to follow clear ANSI guidelines....
>  

Check-in [4732] implements a built-in RTRIM collating sequence
that provides the ignore-spaces comparison semantics that you
desire.  All you have to do is add "COLLATE RTRIM" to the declarations
of text columns in your SQLite schema and SQLite will thereafter
ignore trailing spaces on comparisons involving those columns.
If you do SQL comparisons that do not involve columns, you can
put "COLLATE RTRIM" after the comparison operator itself to get
this behavior.  Example (an actual screen capture):

    drh@elly:~/sqlite/bld> ./sqlite3
    SQLite version 3.5.4
    Enter ".help" for instructions
    sqlite> SELECT 'abc'='abc   ';
    0
    sqlite> SELECT 'abc'='abc   ' COLLATE RTRIM;
    1

--
D. Richard Hipp <[hidden email]>



-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Darren Duncan
In reply to this post by Fowler, Jeff
At 11:19 AM -0500 1/20/08, Fowler, Jeff wrote:
>To restate briefly, ANSI SQL-92 specifies that when comparing two
>character fields, trailing spaces should be ignored. Correct me if
>I'm wrong Darren, but you feel this is a bad decision, and in fact
>SQLite's implementation of character comparison (respecting trailing
>spaces) is superior to ANSI's specs.

Yes, that is indeed what I am saying.

More broadly speaking, and this may already be familiar to some of
you who remember several of my writings over the last few years, I
believe that while SQL has a lot of good things going for it, it also
has numerous flaws, some of which are quite severe in their
consequences.  I am specifically addressing the ANSI/ISO SQL standard
itself with this blame, not any implementation in particular.

I make this assessment of SQL both in respect to how much SQL is able
to represent the relational model of data that Codd proposed to be
used for computer databases, and in respect to how much SQL is
constructed according to well-established principles of good language
design.

As far as I am concerned, any quasi-implementation of SQL that
addresses these flaws is something to applaud.

And at times that it seems SQLite is already doing things a better
way, I am inclined to argue in support of its current status quo.

I won't address/re-address the other perceived SQL flaws in this
thread, to stay on topic, but I'll further clarify my position on the
space-pad thing in light of the previous paragraphs.  It may even
appear that I changed or reversed my position, but I don't feel that
it changed.

1.  The most important thing to have in regards to data types and
values is to have a fully deterministic (and preferrably simple)
concept of value identity, that is, when 2 containers are considered
to hold identical values or not, or should I say, when 2 appearances
of values are in fact the same one value.

2.  The same conceptual value can have multiple physical
representations, but this distinction is meant to be abstracted away
from the user, so for example if the definition of a data type says
that the representations 2.0 and 2.00 are the same value, then an
equality test on them should return true; that said, users should not
even see the difference then; any display of either physical
representation to the user should be normalized to the same thing,
such as 2, so when 2 values are considered equal by the system, they
look the same to the user, but it is still okay to store them
differently behind the scenes.

3.  It is okay in the general case for a system's conception of value
identity to be different than another system's as long as the rules
are clearly documented.  In this respect, it is okay for either
trailing spaces to be significant, or for them to be non-significant,
for determining identity (and by extension, equality), as long as
these rules are consistently applied everywhere that value appears.
Eg, 2 given character strings Foo and Bar can't be considered
identical in some contexts and non-identical in other contexts.  If
you want to have it both ways, you need to have 2 distinct data types
which happen to look similar, eg a CharStrSpSignif data type and a
CharStrSpInsig data type, and then you use values of one type in one
context and separate values of the other type in other contexts.

4.  In this respect, if I don't misunderstand, SQLite's text data
type is the CharStrSpSignif data type, and the SQL standard has the
CharStrSpInsig type instead; if you consider the 2 systems as having
different data types, then this difference of behaviour is
explainable.  Moreover, you can have your choice of the behaviour in
different systems by having both types implemented there to choose
from when you want, like you can choose between text and number types
now.

5.  A more practical example of #2, ignoring the whole spaces thing,
is in regard to Unicode codepoints vs graphemes.  Even if you are
using a consistent byte encoding throughout, such as just UTF-8 or
UTF-16-LE, you still have to be concerned with the fact that Unicode
has multiple normal forms.  Depending on your normal form, such as
normal form C vs normal form D, you may have different sequences of
code points representing the same grapheme.  An example of a single
grapheme being the combination of a plain roman letter plus a
diacritical mark or accent; in NFC, that may be a single codepoint,
in NFD, it might be a sequence of 2 code points.  So, it is important
for a character string data type to explicitly be considered either
as a string of code points or of graphemes, for example.  At the
higher level abstraction, the 2 forms of letter+accent would be
considered identical, but in the lower level abstraction, they would
be non-identical.  Note that afaik most high-level Unicode systems
normally work in the highest abstraction level possible (whether they
synchronize the normal form on storage or on compare is beside the
point), as that is what users would expect; in which case, the actual
codepoints in use would be considered non-significant, and be
abstracted away from the user.

6.  So as long as identity considerations are handled properly, it
doesn't matter for satisfying the relational model of data as to
whether trailing spaces are significant, just as it doesn't for
graphemes vs codepoints abstraction.  So then in this regard I
consider SQLite's current approach and the SQL standard's
proscription to be equally valid.

7.  So my argument about that trailing spaces should be considered
significant comes more down to what is considered well established
principles of good language design.  I would argue that if you want a
simpler situation, that all the characters are significant, and that
is what most programming languages do for character string literals.

8.  If one wants to argue for the merits of ignoring trailing spaces,
then I would ask for what reason and why stop there?  I would imagine
that a valid reason to consider said spaces insignificant is if, say,
the text is meant to represent some human speech, and it is more just
that there are spaces between or around words at all that is
significant, not how many spaces.  And so, in such a situation where
trailing spaces are insignificant, I would think that having varying
amounts of space characters between words is also insignificant, and
comparisons should treat as if each word is separated by exactly one
space.

9.  And so an argument for all characters being significant is
largely an argument in keeping things simple, which I think in the
general case is what people expect.  For situations where people
expect different, they probably expect multiple other differences in
conjunction with the trailing spaces thing, such as middle or leading
spaces.

10.  In the interests of useability, the base behaviour should be
simpler, such as SQLite is, and special-casing strings should be
built on top of that base, rather than the other way around.  It's
probably a lot easier or more elegant to add special cases than to
remove them.  Also such as drh provided with his new collation commit.

-- Darren Duncan

P.S.  As another piece of full-disclosure, I'm in the midst of
writing the spec for an industrial-quality programming language,
named Muldis D, which is intended to replace SQL as the defacto
language of choice for relational databases.  I'm also significantly
involved in the design of the Perl 6 language.  So I have been
looking at the relevant issues quite closely and I believe I can
rationalize any arguments I make in regards to how a DBMS or a
programming language should behave, and moreover that such
differences from the SQL standard are viable in the real world for
real work.

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: SQLite character comparisons

Fowler, Jeff
In reply to this post by Jay Sprenkle
I don't recall asking for a vote on the subject or asking anyone to agree with me, but thank you for your input anyway Jay.
 
And I do appreciate Dr H. for adding the "COLLATE RTRIM" declaration!
 
Regards,
 
- Jeff

________________________________

From: Jay Sprenkle [mailto:[hidden email]]
Sent: Sun 1/20/2008 11:56 AM
To: [hidden email]
Subject: Re: [sqlite] SQLite character comparisons



On Jan 20, 2008 10:19 AM, Fowler, Jeff <[hidden email]> wrote:
>
> Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue.

Sqlite is DRH's project. Voting doesn't come into the picture
anywhere. If you convince him
to fix it then he might, if not you're wasting your time asking if
people agree with you.

>Sure - we can handle this situation by writing more code looking for
spaces everywhere they might occur.

The source for the database engine is provided. It would be more
efficient to change it there.
You could also provide the code back to the community as an option for
others to use.

I'm not trying to be unpleasant, I just believe where you're headed
isn't going to be as valuable as other options.

--
The PixAddixImage Collector suite:
http://groups-beta.google.com/group/pixaddix

SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com <http://www.cthulhubucks.com/>

-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------





-----------------------------------------------------------------------------
To unsubscribe, send email to [hidden email]
-----------------------------------------------------------------------------
12