Configuring csv extension to use '; ' as column delimiter instead of ',' ?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Olivier Mascia
Considering:

CREATE VIRTUAL TABLE temp.t1 USING csv(filename='thefile.csv');

Is there any way to teach the csv extension to use ';' instead of ',' as the column delimiter, getting away from the strict RFC4180 definition?

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Simon Slavin-3
On 1 May 2018, at 11:11am, Olivier Mascia <[hidden email]> wrote:

> CREATE VIRTUAL TABLE temp.t1 USING csv(filename='thefile.csv');
>
> Is there any way to teach the csv extension to use ';' instead of ',' as the column delimiter, getting away from the strict RFC4180 definition?

The source code for the csv extension is in the archive.  Make your own version called 'ssv'.

Or, if you can rely on strings in the file not having commas in them, use a simple shell tool to replace all commas with semicolons.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Olivier Mascia
> Le 1 mai 2018 à 14:00, Simon Slavin <[hidden email]> a écrit :
>
>> CREATE VIRTUAL TABLE temp.t1 USING csv(filename='thefile.csv');
>>
>> Is there any way to teach the csv extension to use ';' instead of ',' as the column delimiter, getting away from the strict RFC4180 definition?
>
> The source code for the csv extension is in the archive.  Make your own version called 'ssv'.
>
> Or, if you can rely on strings in the file not having commas in them, use a simple shell tool to replace all commas with semicolons.

Thanks a lot for the answer Simon, even though I knew of these solutions. :)
My question was more generic, even though it didn't look that way: the well-known and (maybe too) much-used software tool named Excel tend to encourage people to export "CSV" files which are actually "SCSV" files (semi-colon separated values). So the need to script some data happens regularly. Very easy for programmers like you and me. A bit harder for people just trained enough for some simple sqlite command-line interactions, they might not have the right tool / script-languages knowledge to do it easily or more importantly to do it right.
Nothing really important.  Just wanted to trigger some thinking.

Again, thank you.
--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Simon Slavin-3
On 1 May 2018, at 3:01pm, Olivier Mascia <[hidden email]> wrote:

> My question was more generic, even though it didn't look that way: the well-known and (maybe too) much-used software tool named Excel tend to encourage people to export "CSV" files which are actually "SCSV" files (semi-colon separated values). So the need to script some data happens regularly.

What kind of insanity would lead someone to invent semi-colon delimited format when CSV already existed ?  I bet it was a badly-thought-out solution for European numeric formats which use a comma as a decimal point.  Tell whoever uses that format to stop.

Seems like the best way to solve this would be to write a converter for Windows which converts SCSV to CSV.  Then it could be used by all Excel users instead of just SQLite users.  Take a look at

<https://blogs.technet.microsoft.com/heyscriptingguy/2011/10/18/convert-a-semicolon-delimited-file-to-a-csv-file/>

<https://social.technet.microsoft.com/Forums/windows/en-US/370ee470-f2cd-4f30-a167-b106dd51d47a/powershell-convert-csv-to-xlsx>

and hack up a solution.  I'd do it myself but I don't use Windows.

Simon.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

dave
In reply to this post by Olivier Mascia
> From: sqlite-users
> [mailto:[hidden email]] On
> Behalf Of Olivier Mascia
>
> Considering:
>
> CREATE VIRTUAL TABLE temp.t1 USING csv(filename='thefile.csv');
>
> Is there any way to teach the csv extension to use ';'
> instead of ',' as the column delimiter, getting away from the
> strict RFC4180 definition?
 
While on the topic of Excel generated CSV, I had occaision to do this a
year-and-a-half ago, so let me forewarn you of a couple other things:
*  quoting; excel will sometimes enclose a field in quotes, and sometimes
not, depending on the content therein
*  quoting; excel uses 'double quoting' for 'escaping'; i.e. two quotation
marks in a row are interpreted as a quote char, rather than the arguably
more common backslash style escaping
*  multi-line fields;  your excel document can have cells which contain
multiple lines (i.e. embedded carriage returns)

Anyway, all this led me in my case to write a state machine to read in a
logical 'line' of text, and crack it into fields.

That being said, just now taking a peek at the current implementaion in the
sqlite codebase:

ext\misc\csv.c:197

Is a commment

/* Read a single field of CSV text.  Compatible with rfc4180 and extended
** with the option of having a separator other than ",".

So maybe that implementation is already enhanced to accommodate those cases,
but the capability is simply not exposed through parameters to the
CREATE VIRTUAL TABLE CSV (...)

So maybe one would just need to modify the csvtabCreate to process some
additional parameters and propagate those settings to the implementation.

-dave


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Peter da Silva
In reply to this post by Simon Slavin-3
Having tried to write a generic clean HANDLES ALL CSV reader for speedtables, I kind of want to burn Excel with nuclear fire, but that's a side issue. :)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Luuk
In reply to this post by Simon Slavin-3


On 1-5-2018 16:20, Simon Slavin wrote:

> On 1 May 2018, at 3:01pm, Olivier Mascia <[hidden email]> wrote:
>
>> My question was more generic, even though it didn't look that way: the well-known and (maybe too) much-used software tool named Excel tend to encourage people to export "CSV" files which are actually "SCSV" files (semi-colon separated values). So the need to script some data happens regularly.
> What kind of insanity would lead someone to invent semi-colon delimited format when CSV already existed ?  I bet it was a badly-thought-out solution for European numeric formats which use a comma as a decimal point.  Tell whoever uses that format to stop.
>
> Seems like the best way to solve this would be to write a converter for Windows which converts SCSV to CSV.  Then it could be used by all Excel users instead of just SQLite users.  Take a look at
>
> <https://blogs.technet.microsoft.com/heyscriptingguy/2011/10/18/convert-a-semicolon-delimited-file-to-a-csv-file/>
>
> <https://social.technet.microsoft.com/Forums/windows/en-US/370ee470-f2cd-4f30-a167-b106dd51d47a/powershell-convert-csv-to-xlsx>
>
> and hack up a solution.  I'd do it myself but I don't use Windows.
>
> Simon.
>

You 'forgot' this link:
https://answers.microsoft.com/en-us/msoffice/forum/msoffice_excel-mso_windows8-mso_2010/sep-support-for-older-excel/293076a3-6593-4ceb-8167-d29aa3418773

;)


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

R Smith-2
In reply to this post by Simon Slavin-3


On 2018/05/01 4:20 PM, Simon Slavin wrote:
> On 1 May 2018, at 3:01pm, Olivier Mascia <[hidden email]> wrote:
>
>> My question was more generic, even though it didn't look that way: the well-known and (maybe too) much-used software tool named Excel tend to encourage people to export "CSV" files which are actually "SCSV" files (semi-colon separated values). So the need to script some data happens regularly.
> What kind of insanity would lead someone to invent semi-colon delimited format when CSV already existed ?  I bet it was a badly-thought-out solution for European numeric formats which use a comma as a decimal point.  Tell whoever uses that format to stop.

*Sigh* If only...

There are only a couple of Countries using semi-colon list separators as
far as I am aware, but the problem is that one of those is Holland - not
exactly some third-World stow-away island that can be ignored. A lot of
the things we make had to be Holland-friendly and so allowing
semi-colon-happy CSV files became the norm.

I have successfully made a simple converter to read semi-colon CSV files
(in fact, to detect which separator is used), and convert if needed,
which does the trick, though it makes some important assumptions. (I'm
willing to share if interested, but it's a Windoze CLI, mail me direct
if needed).

To another post hating on Excel - Excel has many flaws, but this is not
one of them, it's a fault of the list-separator setting in the Windows
OS on which the Excel runs... It's also a flaw of software anyone may
make, where it will produce one sort of output in one Country and a
different kind in another, if you bother to use the locale settings.
Also, the doubled-double-Quote char escaping is the law of the CSV
standard, not of Excel, and backslash escaping is by no means more
common or more or less correct. All of CSV and all of SQL use doubled-up
quotes to escape strings - which must make it vastly more common than
anything else.

To make matters worse, some enlightened Dutchmen realise this semi-colon
insanity and so cleverly set their own computer's locale to use the
comma in stead of the semi-colon. So now you cannot trust that it will
use the semi-colon either - you HAVE to manually check and act accordingly.

Some other countries have other problems, like setting the decimal
separator to a comma - imagine that in a CSV file - completely breaking
it unless you cleverly Quote all the numeric values too, or ignore that
local setting too....  Just silly.


I agree - someone needs to be shot for this. I just can't really figure
out who.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Peter da Silva
    To another post hating on Excel - Excel has many flaws, but this is not
    one of them, it's a fault of the list-separator setting in the Windows
    OS on which the Excel runs.

CSV is an interchange format, it's for software to communicate with other software, so the syntax needs to be independent of the locale since you don't know if the sender and recipient are in the same locale. Field separator is syntax, so the locale settings should not have any effect on it.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Simon Slavin-3
On 1 May 2018, at 6:43pm, Peter Da Silva <[hidden email]> wrote:

> CSV is an interchange format, it's for software to communicate with other software, so the syntax needs to be independent of the locale since you don't know if the sender and recipient are in the same locale. Field separator is syntax, so the locale settings should not have any effect on it.

The problem is not a file format with semicolons in.  The problem is calling it "CSV" or allowing such files to be saved with ".csv" extensions.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

R Smith-2
In reply to this post by Peter da Silva
On 1 May 2018, at 6:43pm, Peter Da Silva<[hidden email]>  wrote:

> CSV is an interchange format, it's for software to communicate with other software, so the syntax needs to be independent of the locale since you don't know if the sender and recipient are in the same locale. Field separator is syntax, so the locale settings should not have any effect on it.

That is a very bold and idealistic claim. Not even the person who wrote
the currently used CSV standard (RFC4180 anyway) claims such things - in
fact he is very quick to point out that it isn't a standard but more of
a "commonly used format" which did not have official documentation and
so he decided to write up some.  But I am on your side of this war - I
feel it "should" be standard, it's just that since it isn't a standard,
it's hard to lord it over others doing differently.

There have been very many custom implementations (complete with
documentation) for many of the CSV formats around, but it never really
progressed to a standard - which is why I assume so much trouble exists.

Probably my favourite line from RFC4180 is to Implementors: "be
conservative in what you do, be liberal in what you accept from others".


While on the subject, when I was doing the converter mentioned earlier,
I had researched the CSV thing trying to figure out a rule-book (with
very little success), but I did make this little test CSV file handy to
push through all the CSV importers/readers to see who does it how.

I managed to find it now again and thought I would post it for fun. I
have a result set that is correct according to RFC4180 (and yes, SQLite
gets it right) but as a fun exercise to the reader, put this text in a
file named something.csv and open it in some CSV readers.

For more fun, try to predict the outcome before starting.

What I have found is that there are nearly as many differences in the
interpretation as there are CSV interpreters.


csv file--------------------------------------------------------------
Item,Value1, Value 2
1,This is a non-quoted string.,"This is a ""quoted"" one"
2,"This contains a comma ',' and some single quotes.",This one doesn't
3,"Magic disappearing value after quoted value" This text should be
hidden, This text must be visible.
4,"This is all on one line","This string goes
over
3 lines."
5, "The value to my right is empty, see how Excel gets this wrong!",
6, The,End
eof--------------------------------------------------------------

PS: According to the RFC, the last record of the last row may or may not
contain an ending <CRLF>. I always add one since it makes appending the
file easier later.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Peter da Silva
On 5/1/18, 1:15 PM, "sqlite-users on behalf of R Smith" <[hidden email] on behalf of [hidden email]> wrote:

    On 1 May 2018, at 6:43pm, Peter Da Silva<[hidden email]>  wrote:
   
    > CSV is an interchange format, it's for software to communicate with other software, so the syntax needs to be independent of the locale since you don't know if the sender and recipient are in the same locale. Field separator is syntax, so the locale settings should not have any effect on it.
   
    That is a very bold and idealistic claim.

I'm not claiming what you think I'm claiming. All I'm saying is that CSV is meant to communicate with software, not humans. Locales are there for humans. Using "the locale says list separates are semicolons" as a justification for sometimes using semicolons in an interchange format instead of commas doesn't make sense. Do you actually have a reference for that being the reason for Excel using semicolons?

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

R Smith-2

On 2018/05/01 8:21 PM, Peter Da Silva wrote:
> On 5/1/18, 1:15 PM, "sqlite-users on behalf of R Smith" <[hidden email] on behalf of [hidden email]> wrote:
>
>      On 1 May 2018, at 6:43pm, Peter Da Silva<[hidden email]>  wrote:
>      
>      > CSV is an interchange format, it's for software to communicate with other software, so the syntax needs to be independent of the locale since you don't know if the sender and recipient are in the same locale. Field separator is syntax, so the locale settings should not have any effect on it.
>      
>      That is a very bold and idealistic claim.
>
> I'm not claiming what you think I'm claiming. All I'm saying is that CSV is meant to communicate with software, not humans. Locales are there for humans. Using "the locale says list separates are semicolons" as a justification for sometimes using semicolons in an interchange format instead of commas doesn't make sense.

My point is that CSV was not necessarily "meant" to be what you say. Who
exactly "meant" for it to be that? Because the official stuff makes no
such claim or mention.

That said, I'm agreeing, it is nowadays very much used as a data
exchange, no modern users read CSV files for a hobby, and Excel (or any
other software) using the locale for "List separator" inside CSV files
should probably turn a new page and stop doing that - I feel it will be
better, I'm just not as convinced as you that my opinion on this is
ultimate.


> Do you actually have a reference for that being the reason for Excel using semicolons?

I have zero proof that it is WHY they do it, but I have proof that it IS
what they do - the verification is rather easy, you can change your
windows locale setting for List separator and see for yourself. (The
record separator in turn affects the records, but mostly that is CRLF in
all locales). You might need to restart Excel after changing locale
settings.

Your point is however more towards the WHY question, and no, I am not
sure. But, you know, it's Microsoft - after they made "locales" they
probably wanted someone to use it, so they bullied Excel division into
it. :)




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Peter da Silva


On 5/1/18, 1:42 PM, "sqlite-users on behalf of R Smith" <[hidden email] on behalf of [hidden email]> wrote:
    My point is that CSV was not necessarily "meant" to be what you say. Who
    exactly "meant" for it to be that? Because the official stuff makes no
    such claim or mention.

Bah. Existential shenanigans. There's probably some pseudo-Latin or German term for what I meant.

    Your point is however more towards the WHY question, and no, I am not
    sure. But, you know, it's Microsoft - after they made "locales" they
    probably wanted someone to use it, so they bullied Excel division into
    it. :)
   
From what I've heard the Excel division is the least bulliable part of the company. Didn't they maintain their own C compiler for a while to cut down on dependencies on the rest of Microsoft?

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

D Burgess
For those of you who use SQLite to prepare CSV for import/open into
Excel beware of this problem:

"Text","Next bit is a reference id","A001"
"text","same again","0009"

On Windows, In the second row, 3rd column Excel will remove the
leading zeroes, if the file has an extension of .csv
The same content in a file with the extension .txt is processed as
csv, but without the removal of the leading zeroes from text values.

This is value translation not formatting. Nightmare.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Configuring csv extension to use '; ' as column delimiter instead of ',' ?

Niall O'Reilly
In reply to this post by Simon Slavin-3
On 1 May 2018, at 15:20, Simon Slavin wrote:

> Seems like the best way to solve this would be to write a converter for Windows which converts SCSV to CSV.  Then it could be used by all Excel users instead of just SQLite users.

As the heavy lifting of implementing Python on Windows has been done already, I think
that some wrapping around this might do the job, since the separator is parameterized:
https://docs.python.org/3.6/library/csv.html

Like Simon,

> I'd do it myself but I don't use Windows.

8-)

Niall O'Reilly

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

signature.asc (921 bytes) Download Attachment