csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

Jannick
Hi everyone,

would it be possible to add to the csv extension the following parameter
options (with syntax along the lines of):

- sep=';': field separator character (different from default ',')
- skip=N: skip the first N lines

Both options would be very helpful for information of fixed format
downloaded from information providers.

I would be happy to suggest tiny patches for this if the sqlite devels
wanted me to do so and advised me on how to submit patches, a pull request
or alike.

Many thanks,
J.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

Robert M. Münch
On 3 Dec 2019, at 16:10, Jannick wrote:

> would it be possible to add to the csv extension the following parameter
> options (with syntax along the lines of):
>
> - sep=';': field separator character (different from default ',')
> - skip=N: skip the first N lines
>
> Both options would be very helpful for information of fixed format
> downloaded from information providers.

Hi, those would be very useful enhancements.

--

Robert M. Münch

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

signature.asc (564 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

Kees Nuyt
On Tue, 04 Feb 2020 13:18:30 +0100, you wrote:

>On 3 Dec 2019, at 16:10, Jannick wrote:
>
>> would it be possible to add to the csv extension the following parameter
>> options (with syntax along the lines of):
>>
>> - sep=';': field separator character (different from default ',')
>> - skip=N: skip the first N lines
>>
>> Both options would be very helpful for information of fixed format
>> downloaded from information providers.
>
> Hi, those would be very useful enhancements.

I don't see the need, the feature is available in another form

$ cat test.csv

"a";"b"
"c";"d"

$ sqlite3 test.db \
"DROP TABLE IF EXISTS tbl1" \
".mode csv" \
".separator ;" \
".import test.csv tbl1" \
". mode column" \
".headers on" \
"SELECT * FROM tbl1"

a           b
----------  ----------
c           d


--
Regards,
Kees Nuyt

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

Simon Slavin-3
In reply to this post by Robert M. Münch
On 4 Feb 2020, at 12:18pm, Robert M. Münch <[hidden email]> wrote:

> - sep=';': field separator character (different from default ',')

If you provide this facility, please don't add it to anything called 'csv' since the 'c' stands for 'comma'.

For those playing along at home, csv files using semi-colon are a result of a bug in Excel.  Windows has a setting for a 'list separator'.  The two most usual values are ',' and ';'.  The CSV export filter in Excel takes its separator from this field rather than always using a comma, because it was written by someone who wasn't aware of, didn't understand, or was intentionally trying to disrupt the standard.  Decades after being told about the bug, Microsoft hasn't fixed it.

There are a couple of other errors in Excel's CSV filters including how strings are quoted and how a blank value differs from a zero-length string.  The best way I've seen to handle this was to add a new filter to your software, similar to 'csv', called something like 'exceltext' which did things the Excel way.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

Robert Hairgrove
On 04.02.20 15:42, Simon Slavin wrote:
> On 4 Feb 2020, at 12:18pm, Robert M. Münch <[hidden email]> wrote:
>
>> - sep=';': field separator character (different from default ',')
> If you provide this facility, please don't add it to anything called 'csv' since the 'c' stands for 'comma'.
>
> For those playing along at home, csv files using semi-colon are a result of a bug in Excel.  Windows has a setting for a 'list separator'.  The two most usual values are ',' and ';'.  The CSV export filter in Excel takes its separator from this field rather than always using a comma, because it was written by someone who wasn't aware of, didn't understand, or was intentionally trying to disrupt the standard.  Decades after being told about the bug, Microsoft hasn't fixed it.
>
> There are a couple of other errors in Excel's CSV filters including how strings are quoted and how a blank value differs from a zero-length string.  The best way I've seen to handle this was to add a new filter to your software, similar to 'csv', called something like 'exceltext' which did things the Excel way.

Believe it or not, there is no binding standard for the CSV format. The
closest anyone has come was RFC 4180.
However:

According to RFC 4180, section 2:
   "While there are various specifications and implementations for the
    CSV format (for ex. [4], [5], [6] and [7]), there is no formal
    specification in existence, which allows for a wide variety of
    interpretations of CSV files."

https://tools.ietf.org/html/rfc4180#section-2

In section 3, under "Interoperability considerations":
   "Due to lack of a single specification, there are considerable
    differences among implementations.  Implementors should "be
    conservative in what you do, be liberal in what you accept from
    others" (RFC 793 [8]) when processing CSV files."

https://tools.ietf.org/html/rfc4180#section-3

That being said, the problem with trying to enforce the comma as the
sole delimiter character is due to the fact that over half of the
non-English speaking world (or perhaps even more) uses the comma as the
decimal separator. The "work-around" for that, of course, would be to
enclose all fields in double quote characters. But, as we know, the
800-pound gorilla in the room doesn't necessarily do that...

I agree that this would be a very good option to have. In the meantime,
check out libcsv on GitHub:
https://github.com/rgamble/libcsv

It adheres as closely to what standards there are, and you can choose
your own delimiter and quote character if you like. Of course, you have
to do some programming to use it, but it's really easy to use. And it is
very fast since it does just one thing, but does it very well.

HTH,
Bob Hairgrove

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

Keith Medcalf
In reply to this post by Robert M. Münch

On Tuesday, 4 February, 2020 05:19, Robert M. Münch <[hidden email]> wrote:

>On 3 Dec 2019, at 16:10, Jannick wrote:

>> would it be possible to add to the csv extension the following
>> parameter options (with syntax along the lines of):

>> - sep=';': field separator character (different from default ',')
>> - skip=N: skip the first N lines

>> Both options would be very helpful for information of fixed format
>> downloaded from information providers.

>Hi, those would be very useful enhancements.

I did some of the changes and you can get the resulting extension called VSV (Variably Separated Values) from:

http://www.dessus.com/files/vsv.c

I added the facility to specify the field and record separator characters.  I did not add skip, though I may look at adding that too, but one can simply use the OFFSET in SQL to ignore some rows at the beginning of the file.  I made some other changes also which makes this non-compliant with the RFC.

Data between the "field separator" markers can consist of any arbitrary string of bytes that DOES NOT include the field or record separator bytes.
Data between the double-quotes can consist of any arbitrary string of bytes except that double-quotes must be escaped by doubling them.

The added parameters are fsep=SEPERATOR and rsep=SEPERATOR for the field and record seperators respectively.

SEPERATOR is a single quoted string that may be in the following formats:

'x'    where x is any arbitrary byte and will be used as the separator character.
'\x'   for standard escape codes (tab = \t, vtab = \v, Formfeed = \f, Newline = \n).
'\xhh' where hh is the hexidecimal code for the byte to use.

defaults if not specified are fsep=',' and rsep='\n'

so to read the following file:

a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7

you can use the following commands:

SQLite version 3.32.0 2020-02-05 02:43:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create virtual table tbartilde using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~');
sqlite> .mode col
sqlite> .head on
sqlite> select * from tbartilde;
a           b           c           d
----------  ----------  ----------  ----------
1           2           3           4
2           3|5         4           5
3           4           5           6
4           5           6           7

Might still have some line counting errors and haven't figured out how to implement skip yet ...

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

Keith Medcalf
On Tuesday, 4 February, 2020 22:31, Keith Medcalf <[hidden email]> wrote:

The vsv.c (source) file line counting is now verified and I have added a skip= parameter.  Adding skip was far easier than variable separators ...

Same location, file updated:  http://www.dessus.com/files/vsv.c
The complete collection:      http://www.dessus.com/files/sqlite3extensions.zip

Using tbartilde.csv containing:
a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7

SQLite version 3.32.0 2020-02-05 05:21:57
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~');
sqlite> .mode col
sqlite> .head on
sqlite> select * from t;
a           b           c           d
----------  ----------  ----------  ----------
1           2           3           4
2           3|5         4           5
3           4           5           6
4           5           6           7

sqlite> drop table t;
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~', skip=3);
sqlite> select * from t;
a           b           c           d
----------  ----------  ----------  ----------
4           5           6           7

sqlite> drop table t;
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~', skip=4);
sqlite> select * from t;
sqlite> drop table t;
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~', skip=5);
Error: premature end of file during skip
sqlite>



>On Tuesday, 4 February, 2020 05:19, Robert M. Münch <[hidden email]> wrote:
>
>>On 3 Dec 2019, at 16:10, Jannick wrote:
>
>>> would it be possible to add to the csv extension the following
>>> parameter options (with syntax along the lines of):
>
>>> - sep=';': field separator character (different from default ',')
>>> - skip=N: skip the first N lines
>
>>> Both options would be very helpful for information of fixed format
>>> downloaded from information providers.
>
>>Hi, those would be very useful enhancements.
>
>I did some of the changes and you can get the resulting extension called
>VSV (Variably Separated Values) from:
>
>http://www.dessus.com/files/vsv.c
>
>I added the facility to specify the field and record separator
>characters.  I did not add skip, though I may look at adding that too,
>but one can simply use the OFFSET in SQL to ignore some rows at the
>beginning of the file.  I made some other changes also which makes this
>non-compliant with the RFC.
>
>Data between the "field separator" markers can consist of any arbitrary
>string of bytes that DOES NOT include the field or record separator
>bytes.
>Data between the double-quotes can consist of any arbitrary string of
>bytes except that double-quotes must be escaped by doubling them.
>
>The added parameters are fsep=SEPERATOR and rsep=SEPERATOR for the field
>and record seperators respectively.
>
>SEPERATOR is a single quoted string that may be in the following formats:
>
>'x'    where x is any arbitrary byte and will be used as the separator
>character.
>'\x'   for standard escape codes (tab = \t, vtab = \v, Formfeed = \f,
>Newline = \n).
>'\xhh' where hh is the hexidecimal code for the byte to use.
>
>defaults if not specified are fsep=',' and rsep='\n'
>
>so to read the following file:
>
>a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7
>
>you can use the following commands:
>
>SQLite version 3.32.0 2020-02-05 02:43:27
>Enter ".help" for usage hints.
>Connected to a transient in-memory database.
>Use ".open FILENAME" to reopen on a persistent database.
>sqlite> create virtual table tbartilde using
>vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~');
>sqlite> .mode col
>sqlite> .head on
>sqlite> select * from tbartilde;
>a           b           c           d
>----------  ----------  ----------  ----------
>1           2           3           4
>2           3|5         4           5
>3           4           5           6
>4           5           6           7
>
>Might still have some line counting errors and haven't figured out how to
>implement skip yet ...

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users