sqlite3 shell .import bug (edge case)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

sqlite3 shell .import bug (edge case)

Lindsay Lawrence
Hi,

I am currently using sqlite3 version:

3.8.1 2013-10-17 12:57:35 c78be6d786c19073b3a6730dfe3fb1be54f5657a

I found an edge case bug in the sqlite3 shell when importing csv data with
fields containing embedded quotes, CRs and LFs:

When a field contains an embedded quote, and that embedded quote is
immediately followed by an EOL then the current csv parser will consider
the end of field to be reached and fail to import the row correctly.

For example the following csv, a single data row with column headers will
fail to import correctly.

column1,column2,column3,column4
fielddata1,fielddata2,"field ""
data"" 3",fielddata4


The offending code is in shell.c in function

static char *csv_read_one_field(CSVReader *p);

An example fix could be something like the following -- adding a counter
flag 'cQuoteComplete' to track quotes properly.

static char *csv_read_one_field(CSVReader *p){
  int c, pc;
  int cSep = p->cSeparator;
  int cQuoteComplete = 0;
  p->n = 0;
  c = fgetc(p->in);
  if( c==EOF || seenInterrupt ){
    p->cTerm = EOF;
    return 0;
  }
  if( c=='"' ){
    int startLine = p->nLine;
    int cQuote = c;
    cQuoteComplete+=1;
    pc = 0;
    while( 1 ){
      c = fgetc(p->in);
      if( c=='\n' ) p->nLine++;
      if( c==cQuote ){
        cQuoteComplete+=1;
        if( pc==cQuote ){
          pc = 0;
          continue;
        }
      }
      if( (c==cSep && pc==cQuote)
       || (c=='\n' && pc==cQuote)
       || (c=='\n' && pc=='\r' && p->n>=2 && p->z[p->n-2]==cQuote)
       || (c==EOF && pc==cQuote)
      ){
        if (cQuoteComplete%2 == 0) {
          do{ p->n--; }while( p->z[p->n]!=cQuote );
          p->cTerm = c;
          break;
        }
      }
      if( c==EOF ){
        fprintf(stderr, "%s:%d: unterminated %c-quoted field\n",
                p->zFile, startLine, cQuote);
        p->cTerm = EOF;
        break;
      }
      csv_append_char(p, c);
      pc = c;
    }
    if( cQuoteComplete%2 != 0 ){
      fprintf(stderr, "%s:%d: unescaped %c character\n",
              p->zFile, p->nLine, cQuote);
    }
  }else{
    while( c!=EOF && c!=cSep && c!='\n' ){
      csv_append_char(p, c);
      c = fgetc(p->in);
    }
    if( c=='\n' ){
      p->nLine++;
      if( p->n>1 && p->z[p->n-1]=='\r' ) p->n--;
    }
    p->cTerm = c;
  }
  if( p->z ) p->z[p->n] = 0;
  return p->z;
}


I built the sqlite3 shell from the almagamation source and tested the above
change using my import data; about 1Gb of messy,  but rfc4180 compliant,
CSV. It all imported cleanly.

sqlite3 is a wonderful bit of software. I have been using it for some time
now to munge and query multi-gigabyte size data sets and am very impressed
with its performance and capabilities.

Best Regards,
Lindsay
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3 shell .import bug (edge case)

Richard Hipp-3
Thanks for the test case.  Fixed at
http://www.sqlite.org/src/info/5e239ecda0


On Wed, Dec 11, 2013 at 6:37 AM, Lindsay Lawrence <[hidden email]>wrote:

> Hi,
>
> I am currently using sqlite3 version:
>
> 3.8.1 2013-10-17 12:57:35 c78be6d786c19073b3a6730dfe3fb1be54f5657a
>
> I found an edge case bug in the sqlite3 shell when importing csv data with
> fields containing embedded quotes, CRs and LFs:
>
> When a field contains an embedded quote, and that embedded quote is
> immediately followed by an EOL then the current csv parser will consider
> the end of field to be reached and fail to import the row correctly.
>
> For example the following csv, a single data row with column headers will
> fail to import correctly.
>
> column1,column2,column3,column4
> fielddata1,fielddata2,"field ""
> data"" 3",fielddata4
>
>
> The offending code is in shell.c in function
>
> static char *csv_read_one_field(CSVReader *p);
>
> An example fix could be something like the following -- adding a counter
> flag 'cQuoteComplete' to track quotes properly.
>
> static char *csv_read_one_field(CSVReader *p){
>   int c, pc;
>   int cSep = p->cSeparator;
>   int cQuoteComplete = 0;
>   p->n = 0;
>   c = fgetc(p->in);
>   if( c==EOF || seenInterrupt ){
>     p->cTerm = EOF;
>     return 0;
>   }
>   if( c=='"' ){
>     int startLine = p->nLine;
>     int cQuote = c;
>     cQuoteComplete+=1;
>     pc = 0;
>     while( 1 ){
>       c = fgetc(p->in);
>       if( c=='\n' ) p->nLine++;
>       if( c==cQuote ){
>         cQuoteComplete+=1;
>         if( pc==cQuote ){
>           pc = 0;
>           continue;
>         }
>       }
>       if( (c==cSep && pc==cQuote)
>        || (c=='\n' && pc==cQuote)
>        || (c=='\n' && pc=='\r' && p->n>=2 && p->z[p->n-2]==cQuote)
>        || (c==EOF && pc==cQuote)
>       ){
>         if (cQuoteComplete%2 == 0) {
>           do{ p->n--; }while( p->z[p->n]!=cQuote );
>           p->cTerm = c;
>           break;
>         }
>       }
>       if( c==EOF ){
>         fprintf(stderr, "%s:%d: unterminated %c-quoted field\n",
>                 p->zFile, startLine, cQuote);
>         p->cTerm = EOF;
>         break;
>       }
>       csv_append_char(p, c);
>       pc = c;
>     }
>     if( cQuoteComplete%2 != 0 ){
>       fprintf(stderr, "%s:%d: unescaped %c character\n",
>               p->zFile, p->nLine, cQuote);
>     }
>   }else{
>     while( c!=EOF && c!=cSep && c!='\n' ){
>       csv_append_char(p, c);
>       c = fgetc(p->in);
>     }
>     if( c=='\n' ){
>       p->nLine++;
>       if( p->n>1 && p->z[p->n-1]=='\r' ) p->n--;
>     }
>     p->cTerm = c;
>   }
>   if( p->z ) p->z[p->n] = 0;
>   return p->z;
> }
>
>
> I built the sqlite3 shell from the almagamation source and tested the above
> change using my import data; about 1Gb of messy,  but rfc4180 compliant,
> CSV. It all imported cleanly.
>
> sqlite3 is a wonderful bit of software. I have been using it for some time
> now to munge and query multi-gigabyte size data sets and am very impressed
> with its performance and capabilities.
>
> Best Regards,
> Lindsay
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users