Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Bradford Larsen
SQLite has built-in support for EBCDIC-based systems, but I discovered that it’s been broken since 3.11.0.  If you have an EBCDIC-based system, you can see the brokenness by firing up `sqlite` and trying the `.schema` metacommand – you’ll get an obscure error.

In detail, in February 2016, several changes were made to the SQL tokenizer for performance to use a character lookup table instead of a switch statement based on character literals in the C source; see http://www.sqlite.org/src/info/9115baa1919584dc <http://www.sqlite.org/src/info/9115baa1919584dc> and http://www.sqlite.org/src/info/04f7da77c13925c1 <http://www.sqlite.org/src/info/04f7da77c13925c1>.  However, the character lookup table for EBCDIC appears to have several typos in it, causing several ubiquitous characters in SQL input (such as ‘.’) to be classified as invalid characters.  This results in internal low-level queries like

select sql from “main”.sqlite_master

to fail to parse, with a non-obvious error message (I guess such low-level queries are always expected to succeed!).  This broken character lookup table on EBCDIC systems causes pretty much any non-trivial SQL query to fail to parse, and causes for example the ‘.schema’ meta command to fail — making SQLite totally broken out-of-the-box on EBCDIC systems.

The problem is in the `aiClass` character properties table when `SQLITE_EBCDIC` is defined.  This table is defined as follows:

static const unsigned char aiClass[] = {
#ifdef SQLITE_ASCII

#endif
#ifdef SQLITE_EBCDIC
/*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
/* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
/* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 12, 17, 20, 10,
/* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
/* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  7,
/* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
/* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   25,  1,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27,  9, 27, 27, 27, 27, 27,
/* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
#endif
};

While it’s conceivable that this table was written for a different codepage than used by the mainframe I was using, it looks more likely that there are typos in this table.  For example:

There are two “9x” rows in the table above; there is no “Ax” row
There are no entries in this table for the CC_DOT or CC_VARNUM #defines (26 and 6 respectively)
Assuming codepage 1047 (the most commonly used code page?), the entry for the CC_TILDA #define (25) is in the wrong place.

To fix this problem, I patched the SQLite sources to change the `aiClass` character properties table to this:

static const unsigned char aiClass[] = {
#ifdef SQLITE_ASCII

#endif
#ifdef SQLITE_EBCDIC
/*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
/* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
/* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
/* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
/* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  6,
/* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
/* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ax */   27, 25,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27,  9, 27, 27,
/* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
#endif
};

These changes fixed the SQL tokenizer problems I was seeing my EBCDIC-based system, and resulted in a functioning SQLite there.

Please let me know if more is needed to fix this bug.

Hope this helps,
Brad Larsen

https://en.wikipedia.org/wiki/EBCDIC_1047 <https://en.wikipedia.org/wiki/EBCDIC_1047>

P.S.  It would be helpful if the SQLite documentation indicated which EBCDIC codepage(s) were supported.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Richard Hipp-3
On 12/11/16, Bradford Larsen <[hidden email]> wrote:

> #endif
> #ifdef SQLITE_EBCDIC
> /*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
> /* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
> /* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
> /* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
> /* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  6,
> /* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
> /* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Ax */   27, 25,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27,  9, 27, 27,
> /* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
> #endif
> };
>
> These changes fixed the SQL tokenizer problems I was seeing my EBCDIC-based
> system, and resulted in a functioning SQLite there.
>
> Please let me know if more is needed to fix this bug.

Thanks for the suggested fix.  As you probably infer, we do not have
access to an EBCDIC system for testing.

I agree with most of your changes.  But I wonder about moving the
QUOTE2 (the '[' character) value from code 0xba over to 0xad.
According to EBCDIC chart at https://en.wikipedia.org/wiki/EBCDIC the
'[' character should be at 0xba.  Is wikipedia wrong?  Is there
something else about the '[' character in EBCDIC that we should know
about?

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Bradford Larsen
> On Dec 11, 2016, at 7:57 PM, Richard Hipp <[hidden email]> wrote:
>
> On 12/11/16, Bradford Larsen <[hidden email]> wrote:
>
>> #endif
>> #ifdef SQLITE_EBCDIC
>> /*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
>> /* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
>> /* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
>> /* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
>> /* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  6,
>> /* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
>> /* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Ax */   27, 25,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27,  9, 27, 27,
>> /* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
>> #endif
>> };
>>
>> These changes fixed the SQL tokenizer problems I was seeing my EBCDIC-based
>> system, and resulted in a functioning SQLite there.
>>
>> Please let me know if more is needed to fix this bug.
>
> Thanks for the suggested fix.  As you probably infer, we do not have
> access to an EBCDIC system for testing.
>
> I agree with most of your changes.  But I wonder about moving the
> QUOTE2 (the '[' character) value from code 0xba over to 0xad.
> According to EBCDIC chart at https://en.wikipedia.org/wiki/EBCDIC the
> '[' character should be at 0xba.  Is wikipedia wrong?  Is there
> something else about the '[' character in EBCDIC that we should know
> about?


There are many flavors of EBCDIC; the details depend on which code page is used.  The mainframes I’ve used have all been configured for code page 1047 (the “Latin 1/Open System” code page).

The table in the Wikipedia page for EBCDIC (https://en.wikipedia.org/wiki/EBCDIC <https://en.wikipedia.org/wiki/EBCDIC>) is for code page 37.  You’re right that the ‘[‘ character has value 0xba in code page 37, but in code page 1047, the ‘[‘ character has value 0xad (https://en.wikipedia.org/wiki/EBCDIC_1047 <https://en.wikipedia.org/wiki/EBCDIC_1047>).  See also https://en.wikipedia.org/wiki/Code_page#EBCDIC-based_code_pages <https://en.wikipedia.org/wiki/Code_page#EBCDIC-based_code_pages>.

I’m afraid the entire EBCDIC situation is complicated, and there isn’t a single definition of EBCDIC like there is ASCII — in different EBCDIC code pages, characters will have different values.  Code page 1047 may be the most common native code page these days — it’s the code page used by all the mainframes I’ve touched.  (It’s the default code page for C source code as understood by the IBM XL C/C++ compiler, though this does not preclude *running* with a different code page than was used for building.)

Either way, it looks like (a) the code page used by SQLite on EBCDIC systems is ambiguous, and (b) whatever code page you pick, there seem to be typos in the relevant tables in the SQLite source.  The result is a broken SQL tokenizer on EBCDIC-based systems.

I think a particular code page should be chosen, and the SQLite sources & docs should be clear about which code page is used.  Maybe there's someone else out there who has expertise with EBCDIC who will comment.

Best,
Brad Larsen
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Guy Harris
In reply to this post by Richard Hipp-3
On Dec 11, 2016, at 4:57 PM, Richard Hipp <[hidden email]> wrote:

> I agree with most of your changes.  But I wonder about moving the
> QUOTE2 (the '[' character) value from code 0xba over to 0xad.
> According to EBCDIC chart at https://en.wikipedia.org/wiki/EBCDIC the
> '[' character should be at 0xba.  Is wikipedia wrong?

If so, then some obscure company called the International Business Machines Corporation is also wrong:

        http://publibfp.dhe.ibm.com/epubs/pdf/dz9zr010.pdf

The EBCDIC/ISO-8 chart in appendix I of version 10 of the z/Architecture Principles of Operation has [ at 0xba and ] at 0xbb for EBCDIC; it has an accented capital Y at 0xad.

However, it's apparently a bit more complicated; different EBCDIC code pages apparently give square brackets different code points:

        http://www-01.ibm.com/support/docview.wss?uid=swg21515307

Apparently EBCDIC code page IBM 1047 has [ as 0xad and EBCDIC code page 037 has [ as 0xba.  Here's 037:

        https://en.wikipedia.org/wiki/EBCDIC_037

        ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00037.pdf

and here's 1047:

        https://en.wikipedia.org/wiki/EBCDIC_1047

        ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP01047.pdf

The Wikipedia page on EBCDIC code pages:

        https://en.wikipedia.org/wiki/EBCDIC_code_pages

says of 1047 "Open Systems (MVS C compiler)", so the issue may be that Brad Larsen is using that compiler and it's insisting that '[' is 0xad.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Richard Hipp-3
SQLite only uses the "[" character as a compatibility quoting
mechanism for SQL Server.  Maybe the solution is for [...] quoting to
simply not work on EBCDIC systems?

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Bradford Larsen
>
> On Sun, Dec 11, 2016 at 20:42 Richard Hipp <[hidden email]> wrote:
>
> SQLite only uses the "[" character as a compatibility quoting
>
> mechanism for SQL Server.  Maybe the solution is for [...] quoting to
>
> simply not work on EBCDIC systems?
>
>
>
> --
>
> D. Richard Hipp
>
> [hidden email]
>
> This would be one possibility—you then might be able to simultaneously
support several EBCDIC code pages with a single 'aiClass' definition (cp37
and cp1047 for example).

(For what it's worth, I'm sure my application doesn't use this
compatibility quoting mechanism on the EBCDIC system.)

However, maybe it would be better to pick a specific EBCDIC code page to
provide an 'aiClass' definition for in the SQLite sources, and make a note
about this in the docs—perhaps something like this to start:

"SQLite is designed to compile and run on EBCDIC-based systems that use
code page $CHOSEN_CODEPAGE.  If you wish to build SQLite for a system that
uses a different EBCDIC codepage, you will need to modify the definition of
the 'aiClass' table used by the SQL tokenizer."

An alternative possibility would be to revert to the pre-3.11 tokenizer on
EBCDIC systems.  If I recall, the old tokenizer used a big switch statement
with character literals instead of the 'aiClass' table.  I believe this
would avoid the EBCDIC tokenizing troubles, at the expense of lower
performance on those systems, and with the maintenance cost of keeping 2
code paths around.

Brad Larsen
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

Scott Hess
On Mon, Dec 12, 2016 at 9:30 PM, Bradford Larsen <[hidden email]> wrote:
> An alternative possibility would be to revert to the pre-3.11 tokenizer on
> EBCDIC systems.  If I recall, the old tokenizer used a big switch statement
> with character literals instead of the 'aiClass' table.  I believe this
> would avoid the EBCDIC tokenizing troubles, at the expense of lower
> performance on those systems, and with the maintenance cost of keeping 2
> code paths around.

Or the build pass could compile a switch-based generator.  It would
make cross-compiling more complicated.

Perhaps easier would be to have a set of unrolled table initializers
which used the raw characters as indices, called as part of
sqlite3_init().

-scott
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users