vsv module documentation

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

vsv module documentation

Keith Medcalf

Simon,

I hope you don't mind me sending this to you directly, but what do you think of the following as the VSV documentation:

/*
** 2020-02-08 modified by Keith Medcalf who also disclaims all copyright
** on the modifications and hereby places this code in the public domain
**
** This file contains the implementation of an SQLite virtual table for
** reading VSV (Variably Separated Values), which are like CSV files,
** but subtly different.  VSV supports a number of extensions to the
** CSV format as well as more processing options.
**
**
** Usage:
**
**  create virtual table temp.vsv using vsv(...);
**  select * from vsv;
**
** The parameters to the vsv module (the vsv(...) part) are as follows:
**
**  filename=STRING     the filename, passed to the Operating System
**  data=STRING         alternative data
**  columns=N           columns parsed from the VSV file
**  header=BOOL         whether or not a header row is present
**  skip=N              number of leading data rows to skip
**  rsep=STRING         record separator
**  fsep=STRING         field separator
**  validatetext=BOOL   validate UTF-8 encoding of text fields
**  affinity=AFFINITY   affinity to apply to each returned value
**
**
** Defaults:
**
**  filename / data     nothing.  You must provide one or the other
**                      it is an error to provide both or neither
**  schema              nothing.  If not provided then one will be
**                      generated for you from the header, or if no
**                      header is available then autogenerated using
**                      field names manufactured as cX where X is the
**                      column number
**  columns             nothing.  If not specified then the number of
**                      columns is determined by counting the fields
**                      in the first record of the VSV file (which
**                      will be the header row if header is specified),
**                      the number of columns is not parsed from the
**                      schema even if one is provided
**  header=no           no header row in the VSV file
**  skip=0              do not skip any data rows in the VSV file
**  fsep=','            default field separator is a comma
**  rsep='\n'           default record separator is a newline
**  validatetext=no     do not validate text field encoding
**  affinity=none       do not apply affinity to each returned value
**
**
** Parameter types:
**
**  STRING              means a quoted string
**  N                   means a whole number not containing a sign
**  BOOL                means something that evaluates as true or false
**                          it is case insensitive
**                          yes, no, true, false, 1, 0
**  AFFINITY            means an SQLite3 type specification
**                          it is case insensitive
**                          none, blob, text, integer, real, numeric
**
** STRING means a quoted string.  The quote character may be either
** a single quote or a double quote.  Two quote characters in a row
** will be replaced with a single quote character.  STRINGS do not
** need to be quoted if it is obvious where they begin and end
** (that is, they do not contain a comma).  Leading and trailing
** spaces will be trimmed from unquoted strings.
**
**    filename =./this/filename.here, ...
**    filename =./this/filename.here , ...
**    filename = ./this/filename.here, ...
**    filename = ./this/filename.here , ...
**    filename = './this/filename.here', ...
**    filename = "./this/filename.here", ...
**
**  are all equivalent.
**
** BOOL defaults to true so the following specifications are all the
** same:
**
**  header = true
**  header = yes
**  header = 1
**  header
**
**
** Specific Parameters:
**
** The platform/compiler/OS fopen call is responsible for interpreting
** the filename.  It may contain anything recognized by the OS.
**
** The separator string containing exactly one character, or a valid
** escape sequence.  Recognized escape sequences are:
**
**  \t                  horizontal tab, ascii character 9 (0x09)
**  \n                  linefeed, ascii character 10 (0x0a)
**  \v                  vertical tab, ascii character 11 (0x0b)
**  \f                  form feed, ascii character 12 (0x0c)
**  \xhh                specific byte where hh is hexadecimal
**
** For the affinity setting, the following processing is applied to
** each value returned by the VSV virtual table:
**
**  none                no affinity is applied, all fields will be
**                      returned as text just like in the original
**                      csv module, embedded nulls will terminate
**                      the text.  if validatetext is in effect then
**                      an error will be thrown if the field does
**                      not contain validly encoded text or contains
**                      embedded nulls
**
**  blob                all fields will be returned as blobs
**                      validatetext has no effect
**
**  text                all fields will be returned as text just
**                      like in the original csv module, embedded
**                      nulls will terminate the text.
**                      if validatetext is in effect then a blob
**                      will be returned if the field does not
**                      contain validly encoded text or the field
**                      contains embedded nulls
**
**  integer             if the field data looks like an integer,
**                      (regex "^ *[+-]?\d+ *$"),
**                      then an integer will be returned as
**                      provided by the compiler and platform
**                      runtime strtoll function
**                      otherwise the field will be processed as
**                      text as defined above
**
**  real                if the field data looks like a number,
**                      (regex "^ *[+-]?(\d+\.?\d*|\d*\.?\d+)([eE][+-]?\d+)? *$")
**                      then a double will be returned as
**                      provided by the compiler and platform
**                      runtime strtold function otherwise the
**                      field will be processed as text as
**                      defined above
**
**  numeric             if the field looks like an integer
**                      (see integer above) that integer will be
**                      returned
**                      if the field looks like a number
**                      (see real above) then the number will
**                      returned as an integer if it can be
**                      contained entirely within the mantissa
**                      of your platform/compiler long double
**                      type and has no fractional part otherwise
**                      a double will be returned
**                      if the field does not look like a number
**                      then the field will be processed as text
**                      as described above
**
** The validatetext setting will cause the validity of the field
** encoding (not its contents) to be verified.  It effects how
** fields that are supposed to contain text will be returned to
** the SQLite3 library in order to prevent invalid utf8 data from
** being stored or processed as if it were valid utf8 text.
**
*/


--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Simon Slavin
>Sent: Friday, 7 February, 2020 21:27
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] [EXTERNAL] Re: Patch: VTable Column Affinity
>Question and Change Request (WIthdrawn)
>
>On 8 Feb 2020, at 4:05am, Keith Medcalf <[hidden email]> wrote:
>
>> filename=FILENAME is the csv filename in quotes.
>
>Please mention in documentation whether full or partial paths can be
>included.
>
>> header=BOOL       will "parse" the first row to get the column names
>unless a SCHEMA is specified.
>
>Looks good until I see
>
>> skip=N
>
>Perhaps it would be better to use "header=N" to nominate a header row.
>The first row is row 1.  "header=0" means "there is no header row".
>
>Having made this change, "skip=N" says how many lines after the header
>line should be skipped.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users