Re: [EXTERNAL] Re: Patch: VTable Column Affinity Question and Change Request (WIthdrawn)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Patch: VTable Column Affinity Question and Change Request (WIthdrawn)

Keith Medcalf

I don't think that patch to apply affinities is a good idea since it will usually be mostly useless and will negatively impact performance since one can and should assume that the actual author of the VTable knows what they are doing, currently SQLite3 does not enforce declared VTable column affinities and it should stay that way.  That said, although the definitions of the column affinities are stored in the internal schema, it is not trivial to access them from a non-builtin extension.

Instead I have once again changed the vsv.c VTable plugin so that it does some more useful things.

The below parameters work exactly as they did in the csv.c extension with regards to setting the functionality with the single exception that now if you specify all of header, schema and columns, the header row will still be skipped.  In csv.c if you specified both schema and columns then the header row was not skipped even if you specified it existed.

filename=FILENAME is the csv filename in quotes.
data=DATA         provides the data inline.
header=BOOL       will "parse" the first row to get the column names unless a SCHEMA is specified.
schema=STRING     allows the specification of the schema to use
columns=N         number of columns in VTable


New parameters:


skip=N

will skip the specified number of rows at the start of the CSV file, exclusive of the header row (ie, skip data rows). If the header, via the header option, is specified to exist then it will always be skipped.


fsep=SEP and rsep=SEP

allow to specify the particular separators to be used.  The defaults are fsep=',' and rsep='\n' so the default is the same as the original.  The separator can be any single character, or it can be a supported escape code (\t \v \n \f for horizontal-tab, vertical-tab, newline and form-feed), or it can be a hexadecimal coded byte as in '\x1f'.  Multibyte separators cannot be specified.


affinity=AFF

where the affinity can be NONE, BLOB, TEXT, INTEGER, REAL, NUMERIC with the default being NONE.  NONE does nothing different from the existing csv.c VTable (unless you also specify validatetext).  However, if you specify an affinity it applies to *every* returned value and makes the following changes:
 - BLOB returns all data as BLOB
 - TEXT returns all data as TEXT
 - INTEGER returns anything that looks like an integer as an INTEGER otherwise as TEXT
 - REAL returns anything that looks like a number as a REAL (double) otherwise as TEXT
 - NUMERIC returns anything that looks like a number as either an INTEGER or REAL otherwise as TEXT
   - if the field "looks like an integer" then an INTEGER is returned
   - if the field "looks like a number" then a REAL is returned unless it can be converted to an integer
"looks like an integer" effectively means passing the following regex "^ *[+-]?\d+ *$"
"looks like a number" effectively means passing the following regex "^ *[+-]?(\d+\.?\d*|\d*\.?\d+)([eE][+-]?\d+)? *$"

Processing is limited by the runtime C library so the detection is not quite so good as the affinities built into the SQLite3 core.  This means that something that looks like an integer may fail to load as an integer because it is too big (918273745985736365575984857636253857564363 looks like an integer and the SQLite3 core will recognize this and load it as a double.  The C library is not so smart and will probably simply return MAX_INT).

Processing is also limited by the compiler.  Some compilers (wave at Microsoft) do not treat long double as being more precise than a regular double (ie, 10 or 16 bytes rather than 8) and treat that simply as a plain old double (some compilers, like Intel, need a compilation option to enable long doubles being actually longer than a regular double).  It is detected if this is the case (sizeof(long dounble)==sizeof(double)) so that only "integers" that can be entirely contained in the mantissa are converted to integer, and others will remain as double.


validatetext=BOOL

will cause TEXT fields to be validated as containing a valid UTF-8 coding sequence (no content check is performed, only the validity of the encoding is checked).  If this is turned on together with any type of affinity (ie, other than none) then improperly encoded UTF8 text will be returned as a BLOB.  If no affinity is in effect (ie, affinity=none) then an error will be thrown rather then letting a text field contain invalid data.  This means that fields containing embedded nulls (rather than encoded 0 bytes) will not be able to be stored as TEXT and will either be BLOB type or throw an invalid encoding error.


nulls=BOOL

when enabled empty fields will be returned as NULL rather than empty strings.  An empty field means a field with no content (separators are side-by-each).  Specifically empty strings are not changed.  That is, the middle column in this a,,b is a null column but this is not a,"",b

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Hick Gunter
>Sent: Thursday, 6 February, 2020 23:47
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] [EXTERNAL] Re: Patch: VTable Column Affinity
>Question and Change Request
>
>>Von: sqlite-users [mailto:[hidden email]]
>Im Auftrag von Dominique Devienne
>>
>>On Thu, Feb 6, 2020 at 9:32 AM Hick Gunter <[hidden email]> wrote:
>>> >Of course, it may be that the writer of the VTable should know what
>they are doing and generate a VTable definition that is consistent with
>how their cursor methods return data, however ... this will omit the
>OP_Affinity if no >>column type was specified when the VTable was defined
>and most of the VTable declarations in the existing code that I looked at
>do not specify column affinities in the declarations.
>>>
>>> Very nice. but detrimental for our use case. Please refrain from
>adding this tot he distribution by default.
>>> We are almost exclusively using virtual tables to allow queries
>against our internal data sources, which are C language structs and thus
>strictly typed. The column affinities provided by the VTab
>implementations are used for documentation purposes and the xColumn
>methods always return the same type (calling the "wrong" sqlite3_result
>function is considered a programming error). Coercing the returned value
>to the same type would be just a waste of memory and CPU cycles.
>>
>>+1. I fear what it would do to our app, also making extensive use of
>vtables.
>>Like Gunter mentions, the type is there more for documentation, I'm
>unsure "what havoc this could wreak".
>>
>>Note that our vtables are all read-only, if that matters here. It's
>unclear to me if the above applies to writes only, or also applies to
>reads. If to writes only, then I don't care much at the moment, although
>I might in the future, and would likely prefer seeing the raw value in my
>code, than the result of affinity-coercion. --DD
>
>The patch affects the value returned from the VTable implementation if
>affinities are provided by the create table statement from the xCreate
>method
>
>
>___________________________________________
> Gunter Hick | Software Engineer | Scientific Games International GmbH |
>Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 |
>(O) +43 1 80100 - 0
>
>May be privileged. May be confidential. Please delete if not the
>addressee.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Patch: VTable Column Affinity Question and Change Request (WIthdrawn)

Simon Slavin-3
On 8 Feb 2020, at 4:05am, Keith Medcalf <[hidden email]> wrote:

> filename=FILENAME is the csv filename in quotes.

Please mention in documentation whether full or partial paths can be included.

> header=BOOL       will "parse" the first row to get the column names unless a SCHEMA is specified.

Looks good until I see

> skip=N

Perhaps it would be better to use "header=N" to nominate a header row.  The first row is row 1.  "header=0" means "there is no header row".

Having made this change, "skip=N" says how many lines after the header line should be skipped.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users