Limit on number of columns in SQLite table

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Limit on number of columns in SQLite table

Mitar
Hi!

We are considering using SQLite as a ML dataset archival format for
datasets in OpenML (https://www.openml.org/). When investigating it,
we noticed that it has a very low limit on number of columns. Quite
some of datasets we are dealing with have 100k or so columns. Are
there any fundamental reasons why this limit is so low (even if we
extend it during compiling, it can be at most 32k columns), while
others are comfortably large? Any plans to extend this limit in the
future?


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Limit on number of columns in SQLite table

Hick Gunter
100k distinct column names? Or is that 10000 repeats of 10 attributes?

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Mitar
Gesendet: Mittwoch, 16. Oktober 2019 14:57
An: [hidden email]
Betreff: [EXTERNAL] [sqlite] Limit on number of columns in SQLite table

Hi!

We are considering using SQLite as a ML dataset archival format for datasets in OpenML (https://www.openml.org/). When investigating it, we noticed that it has a very low limit on number of columns. Quite some of datasets we are dealing with have 100k or so columns. Are there any fundamental reasons why this limit is so low (even if we extend it during compiling, it can be at most 32k columns), while others are comfortably large? Any plans to extend this limit in the future?


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Richard Hipp-3
In reply to this post by Mitar
On 10/16/19, Mitar <[hidden email]> wrote:

> Hi!
>
> We are considering using SQLite as a ML dataset archival format for
> datasets in OpenML (https://www.openml.org/). When investigating it,
> we noticed that it has a very low limit on number of columns. Quite
> some of datasets we are dealing with have 100k or so columns. Are
> there any fundamental reasons why this limit is so low (even if we
> extend it during compiling, it can be at most 32k columns), while
> others are comfortably large? Any plans to extend this limit in the
> future?

Are you trying to store a big matrix with approx 100k columns?  A
better way to do that in a relational database (*any* relational
database, not just SQLite) is to store one entry per matrix elements:

    CREATE TABLE matrix(row_number INT, column_number INT, value ANY);

Only three columns are required in such a relation, regardless of the
number of columns in the matrix.

If performance and storage efficiency are a high priority and if the
number of rows and columns are limited to 2 billion each, then you can
combine the row and column number into a single integer key:

    cell_number := row_number*2147483648 + column_number;

Then make your table just this:

    CREATE TABLE matrix(cell_number INTEGER PRIMARY KEY, value ANY);

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Limit on number of columns in SQLite table

Mitar
In reply to this post by Hick Gunter
Hi!

On Wed, Oct 16, 2019 at 3:16 PM Hick Gunter <[hidden email]> wrote:
> 100k distinct column names? Or is that 10000 repeats of 10 attributes?

100k distinct names. Like each column a different gene expression.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Mitar
In reply to this post by Richard Hipp-3
Hi!

On Wed, Oct 16, 2019 at 3:29 PM Richard Hipp <[hidden email]> wrote:
> Are you trying to store a big matrix with approx 100k columns?  A
> better way to do that in a relational database (*any* relational
> database, not just SQLite) is to store one entry per matrix elements:

Sure, this is useful for sparse matrix. But in our case we would like
to preserve as much as possible of the original dataset structure. So
those types of transformations then require also users to restructure
data back. For example, Pandas DataFrame can be easily read out of the
SQLite, but if it is converted in a way you described, then user has
to make one more step to move things around.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html

I would hope that if this is something which is generally a good idea,
that databases would internally do such mapping, while exposing
regular structure back to the user.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Limit on number of columns in SQLite table

Simon Slavin-3
In reply to this post by Mitar
On 16 Oct 2019, at 7:03pm, Mitar <[hidden email]> wrote:

> On Wed, Oct 16, 2019 at 3:16 PM Hick Gunter <[hidden email]> wrote:
>> 100k distinct column names? Or is that 10000 repeats of 10 attributes?
>
> 100k distinct names. Like each column a different gene expression.

Don't do that.  It's an abuse of how relational databases are meant to work.

One reason you shouldn't do that in SQLite is that all the data for a row is stored together, in a sequence.  If you ask for the data in the 3756th column of a row, SQLite has to read the data for the first 3755 columns of that row to find it.  As you can imagine, this is slow and involves a lot of I/O.

Dr Hipp posted how to maintain this kind of data in a SQL database: instead of maintaining a database with 100k columns, you use the gene expression as a column name and make that column part of the PRIMARY KEY of the table.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Richard Hipp-3
In reply to this post by Mitar
SQLite could, in theory, be enhanced (with just a few minor tweaks) to
support up to 2 billion columns.  But having a relation with a large
number of columns seems like a very bad idea stylistically.  That's
not how relational databases are intended to be used.  Normally when a
table acquires more than a couple dozen columns, that is a good
indication that you need normalize and/or refactor your schema. Schema
designers almost unanimously follow that design principle.  And so
SQLite is optimized for the overwhelmingly common case of a small
number of columns per table.

Hence, adding the ability to have a table with a huge number of
columns is not something that I am interested in supporting in SQLite
at this time.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Jens Alfke-2
In reply to this post by Mitar

> On Oct 16, 2019, at 6:08 AM, Mitar <[hidden email]> wrote:
>
> Quite
> some of datasets we are dealing with have 100k or so columns.

There was a thread about this a few months ago. You Should not store every number of a huge vector in a separate column. You don’t need to individually query on every individual item, do you? Just store the vector as one column — JSON works well, or a blob containing a C array.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Limit on number of columns in SQLite table

J Decker
In reply to this post by Mitar
On Wed, Oct 16, 2019 at 11:03 AM Mitar <[hidden email]> wrote:

> Hi!
>
> On Wed, Oct 16, 2019 at 3:16 PM Hick Gunter <[hidden email]> wrote:
> > 100k distinct column names? Or is that 10000 repeats of 10 attributes?
>
> 100k distinct names. Like each column a different gene expression.
>
To me that sounds more like a table distinction; such that you;'d have 100k
tables with 2 columns each (their row, and their value of that gene)...
it's too bad table names cant be bound parameters.


>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Limit on number of columns in SQLite table

Hick Gunter
In reply to this post by Mitar
The I suggest normalizing the schema.

Table genes (gene_id integer primary key, name char)

Table variant (variant_id integer primary key, gene_id integer, name char)

Table sample (sample_id integer primary key, name char, ...);

Table sample_gene_variant (sample_id integer, gene_id integer, variant_id integer, unique (sample_id, gene_id));

Appropriate foreign keys are recommended.

Then you can store your 100k bits of information per sample in the sample_gene_variant relation. Also, should the need arise to store new genes or variants, all it takes is new entries in the gene or variant tables, instead of adding more columns to an already bloated table.

Getting the value of a specific column from the intiial schema is then a simple join:

SELECT s.name, g.name, v.name from sample s join gene g join sample_gene_variant sgv on (sgv.sample_id = s.sample_id and sgv.gene_id =  gene.gene_id) join variant v on (v.variant_id = sgv.variant_id) where s.name = 'John Doe' and g.name = 'BCL11B';


-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Mitar
Gesendet: Mittwoch, 16. Oktober 2019 20:03
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] [EXTERNAL] Limit on number of columns in SQLite table

Hi!

On Wed, Oct 16, 2019 at 3:16 PM Hick Gunter <[hidden email]> wrote:
> 100k distinct column names? Or is that 10000 repeats of 10 attributes?

100k distinct names. Like each column a different gene expression.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Limit on number of columns in SQLite table

Hick Gunter
In reply to this post by J Decker
That would be falling off the other side of the horse. Neither column nor table names are supposed to be derived from data items, but rather from abstract categories.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von J Decker
Gesendet: Donnerstag, 17. Oktober 2019 08:24
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] [EXTERNAL] Limit on number of columns in SQLite table

On Wed, Oct 16, 2019 at 11:03 AM Mitar <[hidden email]> wrote:

> Hi!
>
> On Wed, Oct 16, 2019 at 3:16 PM Hick Gunter <[hidden email]> wrote:
> > 100k distinct column names? Or is that 10000 repeats of 10 attributes?
>
> 100k distinct names. Like each column a different gene expression.
>
To me that sounds more like a table distinction; such that you;'d have 100k tables with 2 columns each (their row, and their value of that gene)...
it's too bad table names cant be bound parameters.


>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Mitar
In reply to this post by Richard Hipp-3
Hi!

I can see how this is a reasonable limit when SQLite is used for
querying power it provides. In our case we are really focusing on it
as a standard long-term storage format. So in the "Appropriate Uses
For SQLite" document [1] you have a section called "File archive
and/or data container" and this is why we started considering SQLite
as a dataset archive format. We would not like to store files
directly, but contents of those files (like contents of CSV). But try
to not modify them more than necessary. So we got interested
especially in the "SQLite is a good solution for any situation that
requires bundling diverse content into a self-contained and
self-describing package for shipment across a network." statement. So
I can understand how supporting a large number of columns might be
inappropriate when you want to run complicated SQL queries on data,
but to just store data and then extract all rows to do some data
processing, Or as the most complicated query it would be to extract
just a subsample of rows. But not really do to any JOIN queries or
something like that. it looks like except for artificial limit in
SQLite, because it is not useful for general case, there is no other
reason why it could not be supported.

So why not increase the limit to 2 billion, and have it at runtime by
default limited to 2000. And then using PRAGMA one could increase this
if needed to 2 billion? PRAGMA already can decrease the limit, so we
can keep the existing 2000 limit, but to support it without having to
recompile, people could increase it all the way to 2 billion. Is there
any significant performance downside to this?

[1] https://www.sqlite.org/whentouse.html


Mitar

On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp <[hidden email]> wrote:

>
> SQLite could, in theory, be enhanced (with just a few minor tweaks) to
> support up to 2 billion columns.  But having a relation with a large
> number of columns seems like a very bad idea stylistically.  That's
> not how relational databases are intended to be used.  Normally when a
> table acquires more than a couple dozen columns, that is a good
> indication that you need normalize and/or refactor your schema. Schema
> designers almost unanimously follow that design principle.  And so
> SQLite is optimized for the overwhelmingly common case of a small
> number of columns per table.
>
> Hence, adding the ability to have a table with a huge number of
> columns is not something that I am interested in supporting in SQLite
> at this time.
>
> --
> D. Richard Hipp
> [hidden email]



--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Limit on number of columns in SQLite table

Hick Gunter
Since your data is at least mostly opaque in the sense that SQLite is not expected to interpret the contents, why not split your data into "stuff you want to query ins SQLite" and "stuff you want to just store"? The former means individual columns, whereas the latter could be stored in a single BLOB field, which only your application knows how to extract data from.

This allows SQLite to efficiently process the fields it needs to know about, and return BLOB data efficiently as one single field instead of having to pick it apart into 100k bits.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Mitar
Gesendet: Donnerstag, 17. Oktober 2019 10:56
An: Richard Hipp <[hidden email]>
Cc: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table

Hi!

I can see how this is a reasonable limit when SQLite is used for querying power it provides. In our case we are really focusing on it as a standard long-term storage format. So in the "Appropriate Uses For SQLite" document [1] you have a section called "File archive and/or data container" and this is why we started considering SQLite as a dataset archive format. We would not like to store files directly, but contents of those files (like contents of CSV). But try to not modify them more than necessary. So we got interested especially in the "SQLite is a good solution for any situation that requires bundling diverse content into a self-contained and self-describing package for shipment across a network." statement. So I can understand how supporting a large number of columns might be inappropriate when you want to run complicated SQL queries on data, but to just store data and then extract all rows to do some data processing, Or as the most complicated query it would be to extract just a subsample of rows. But not really do to any JOIN queries or something like that. it looks like except for artificial limit in SQLite, because it is not useful for general case, there is no other reason why it could not be supported.

So why not increase the limit to 2 billion, and have it at runtime by default limited to 2000. And then using PRAGMA one could increase this if needed to 2 billion? PRAGMA already can decrease the limit, so we can keep the existing 2000 limit, but to support it without having to recompile, people could increase it all the way to 2 billion. Is there any significant performance downside to this?

[1] https://www.sqlite.org/whentouse.html


Mitar

On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp <[hidden email]> wrote:

>
> SQLite could, in theory, be enhanced (with just a few minor tweaks) to
> support up to 2 billion columns.  But having a relation with a large
> number of columns seems like a very bad idea stylistically.  That's
> not how relational databases are intended to be used.  Normally when a
> table acquires more than a couple dozen columns, that is a good
> indication that you need normalize and/or refactor your schema. Schema
> designers almost unanimously follow that design principle.  And so
> SQLite is optimized for the overwhelmingly common case of a small
> number of columns per table.
>
> Hence, adding the ability to have a table with a huge number of
> columns is not something that I am interested in supporting in SQLite
> at this time.
>
> --
> D. Richard Hipp
> [hidden email]



--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Limit on number of columns in SQLite table

Mitar
Hi!

In that case we would have to define a standard BLOB storage format,
slightly defeating the idea of using SQLite to define such standard
future-proof format. :-)


Mitar

On Thu, Oct 17, 2019 at 11:19 AM Hick Gunter <[hidden email]> wrote:

>
> Since your data is at least mostly opaque in the sense that SQLite is not expected to interpret the contents, why not split your data into "stuff you want to query ins SQLite" and "stuff you want to just store"? The former means individual columns, whereas the latter could be stored in a single BLOB field, which only your application knows how to extract data from.
>
> This allows SQLite to efficiently process the fields it needs to know about, and return BLOB data efficiently as one single field instead of having to pick it apart into 100k bits.
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Mitar
> Gesendet: Donnerstag, 17. Oktober 2019 10:56
> An: Richard Hipp <[hidden email]>
> Cc: SQLite mailing list <[hidden email]>
> Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table
>
> Hi!
>
> I can see how this is a reasonable limit when SQLite is used for querying power it provides. In our case we are really focusing on it as a standard long-term storage format. So in the "Appropriate Uses For SQLite" document [1] you have a section called "File archive and/or data container" and this is why we started considering SQLite as a dataset archive format. We would not like to store files directly, but contents of those files (like contents of CSV). But try to not modify them more than necessary. So we got interested especially in the "SQLite is a good solution for any situation that requires bundling diverse content into a self-contained and self-describing package for shipment across a network." statement. So I can understand how supporting a large number of columns might be inappropriate when you want to run complicated SQL queries on data, but to just store data and then extract all rows to do some data processing, Or as the most complicated query it would be to extract just a subsample of rows. But not really do to any JOIN queries or something like that. it looks like except for artificial limit in SQLite, because it is not useful for general case, there is no other reason why it could not be supported.
>
> So why not increase the limit to 2 billion, and have it at runtime by default limited to 2000. And then using PRAGMA one could increase this if needed to 2 billion? PRAGMA already can decrease the limit, so we can keep the existing 2000 limit, but to support it without having to recompile, people could increase it all the way to 2 billion. Is there any significant performance downside to this?
>
> [1] https://www.sqlite.org/whentouse.html
>
>
> Mitar
>
> On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp <[hidden email]> wrote:
> >
> > SQLite could, in theory, be enhanced (with just a few minor tweaks) to
> > support up to 2 billion columns.  But having a relation with a large
> > number of columns seems like a very bad idea stylistically.  That's
> > not how relational databases are intended to be used.  Normally when a
> > table acquires more than a couple dozen columns, that is a good
> > indication that you need normalize and/or refactor your schema. Schema
> > designers almost unanimously follow that design principle.  And so
> > SQLite is optimized for the overwhelmingly common case of a small
> > number of columns per table.
> >
> > Hence, adding the ability to have a table with a huge number of
> > columns is not something that I am interested in supporting in SQLite
> > at this time.
> >
> > --
> > D. Richard Hipp
> > [hidden email]
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>  Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Simon Slavin-3
In reply to this post by Mitar
On 17 Oct 2019, at 9:56am, Mitar <[hidden email]> wrote:

> I can understand how supporting a large number of columns might be
> inappropriate when you want to run complicated SQL queries on data,
> but to just store data and then extract all rows to do some data
> processing, Or as the most complicated query it would be to extract
> just a subsample of rows. But not really do to any JOIN queries or
> something like that. it looks like except for artificial limit in
> SQLite, because it is not useful for general case, there is no other
> reason why it could not be supported.
>
> So why not increase the limit to 2 billion

1) Almost no piece of software can handle a grid 2 billion cells wide.  Excel maxes out at 16,384 columns.  Matlab can store and retrieve a cell of data directly from a file, but it has a max array size of 10000.  R maxes out at 2147483647, which is more than 2 billion.  But R has to hold all the data from a matrix in memory at once and it can't assign enough memory to one object to hold that many cells.

2) Object names are not data.  They're descriptions in your favourite human language.  They're not meant to have weird sequences of characters in.

3) Lots of CSV import filters ignore a column header row, or can only create fieldnames with certain limits (max length, no punctuation characters, etc.).  So you should expect to lose fieldnames if you try to import your data into some new piece of software.

And here't the dealbreaker:

(4) SQLite stores all the data for a row is together, in a sequence.  If you ask for the data in the 3756th column of a row, SQLite has to read and parse the data for the first 3755 columns of that row, just to read a single value from storage.  As you can imagine, this is slow and involves a lot of I/O.  And while it happens the row up to that point must all be held in memory.  Consequently, nobody who uses SQLite for its intended purpose actually does this.  I dread to think how slow random access over 2 billion columns would be in SQLite.

So rereading the text from our message I quoted at the top.  No.  Nobody would ever use SQLite to store a table 2 billion columns wide even if they didn't intend to do anything complicated to it.  Because it would take a ridiculous amount of time and memory just to insert data in to a table that wide.

Your gene expressions are data.  They are not the names of table entities.  They should be stored in a table as other posts suggested.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Mitar
Hi!

This is getting a bit off topic.

On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin <[hidden email]> wrote:
> 1) Almost no piece of software can handle a grid 2 billion cells wide.  Excel maxes out at 16,384 columns.  Matlab can store and retrieve a cell of data directly from a file, but it has a max array size of 10000.  R maxes out at 2147483647, which is more than 2 billion.  But R has to hold all the data from a matrix in memory at once and it can't assign enough memory to one object to hold that many cells.

Of course, 2 billion is a lot. But 100k is something many ML libraries
support. Pandas, ndarray, R. Not something to magical about that.

> 2) Object names are not data.  They're descriptions in your favourite human language.  They're not meant to have weird sequences of characters in.

Not sure what this relates to.

> 3) Lots of CSV import filters ignore a column header row, or can only create fieldnames with certain limits (max length, no punctuation characters, etc.).  So you should expect to lose fieldnames if you try to import your data into some new piece of software.

Does SQLite have limitations on what can be a column name? If not,
then I would not worry what some CSV importers do. We would use a good
one to convert to SQLLite.

> (4) SQLite stores all the data for a row is together, in a sequence.  If you ask for the data in the 3756th column of a row, SQLite has to read and parse the data for the first 3755 columns of that row, just to read a single value from storage.  As you can imagine, this is slow and involves a lot of I/O.  And while it happens the row up to that point must all be held in memory.  Consequently, nobody who uses SQLite for its intended purpose actually does this.  I dread to think how slow random access over 2 billion columns would be in SQLite.

I wrote earlier that for us use case where we are reading whole rows
is the most common one.

> Your gene expressions are data.  They are not the names of table entities.  They should be stored in a table as other posts suggested.

Maybe. But often this data is represented as a row of expressions with
columns for each gene. Because this is what is being distributed, we
are looking for ways to store this in a stable format which will be
supported for next 50 years, without modifying to original data too
much. I do hear suggestions to do such transformation, but that is
less ideal for our use case.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Eric Grange-3
> I wrote earlier that for us use case where we are reading whole rows is
the most common one.
> [...]
> we are looking for ways to store this in a stable format which will be
supported for next 50 years, without modifying to original data too much.

If you do not need access to individual columns too much, but want to still
be able to operate on them when needed,
my suggestion would be to store them as JSON in a blob, and use the JSON
functions of SQLite to extract the data
when needed (https://www.sqlite.org/json1.html)

This would mean you have both the stability of a text format for long-term
storage (JSON) while still having
the ability to run queries, index and manipulate data with SQL.

I am using such an approach successfully, and when some fields in the JSON
are flagged for indexing or direct access
(f.i. for summaries / descriptions), I just extract them to a relational
table which can then be indexed and used directly
as usual (you can use triggers to automate that extraction).
You may also be able to use indexes on expression if you only want indexing
(https://www.sqlite.org/expridx.html)

Eric

Le jeu. 17 oct. 2019 à 14:50, Mitar <[hidden email]> a écrit :

> Hi!
>
> This is getting a bit off topic.
>
> On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin <[hidden email]>
> wrote:
> > 1) Almost no piece of software can handle a grid 2 billion cells wide.
> Excel maxes out at 16,384 columns.  Matlab can store and retrieve a cell of
> data directly from a file, but it has a max array size of 10000.  R maxes
> out at 2147483647, which is more than 2 billion.  But R has to hold all the
> data from a matrix in memory at once and it can't assign enough memory to
> one object to hold that many cells.
>
> Of course, 2 billion is a lot. But 100k is something many ML libraries
> support. Pandas, ndarray, R. Not something to magical about that.
>
> > 2) Object names are not data.  They're descriptions in your favourite
> human language.  They're not meant to have weird sequences of characters in.
>
> Not sure what this relates to.
>
> > 3) Lots of CSV import filters ignore a column header row, or can only
> create fieldnames with certain limits (max length, no punctuation
> characters, etc.).  So you should expect to lose fieldnames if you try to
> import your data into some new piece of software.
>
> Does SQLite have limitations on what can be a column name? If not,
> then I would not worry what some CSV importers do. We would use a good
> one to convert to SQLLite.
>
> > (4) SQLite stores all the data for a row is together, in a sequence.  If
> you ask for the data in the 3756th column of a row, SQLite has to read and
> parse the data for the first 3755 columns of that row, just to read a
> single value from storage.  As you can imagine, this is slow and involves a
> lot of I/O.  And while it happens the row up to that point must all be held
> in memory.  Consequently, nobody who uses SQLite for its intended purpose
> actually does this.  I dread to think how slow random access over 2 billion
> columns would be in SQLite.
>
> I wrote earlier that for us use case where we are reading whole rows
> is the most common one.
>
> > Your gene expressions are data.  They are not the names of table
> entities.  They should be stored in a table as other posts suggested.
>
> Maybe. But often this data is represented as a row of expressions with
> columns for each gene. Because this is what is being distributed, we
> are looking for ways to store this in a stable format which will be
> supported for next 50 years, without modifying to original data too
> much. I do hear suggestions to do such transformation, but that is
> less ideal for our use case.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Limit on number of columns in SQLite table

Mitar
Hi!

On Thu, Oct 17, 2019 at 3:04 PM Eric Grange <[hidden email]> wrote:
> my suggestion would be to store them as JSON in a blob, and use the JSON
> functions of SQLite to extract the data

JSON has some crazy limitations like by standard it does not support
full floating point spec, so NaN and infinity cannot be represented
there. So JSON is really no a great format when you want to preserve
as much of the input as possible (like, integers, floats, text, and
binary). SQLite seems to be spot on in this regard.

But yes, if there would be some other standard to SQLite and supported
format to embed, that approach would be useful. Like composite value
types.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Limit on number of columns in SQLite table

Hick Gunter
I have the impression that you still do not grasp the folly of a 100k column schema.

See the example below, which only has 6 fields. As you can see, each field requires a Column opcode and arguments (about 10 bytes) and a "register" to hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to retrieve a row from the database. It ill also involve SQLite decoding 100k field values and your application calling sqlite3_column interface 100k times for each and every row, which yield an expected performance of about 2 rows per second. Can you afford to use that much memory and time?

asql> create temp table genes (id integer primary key, name char, f1 char, f2 char, f3 char, f4 char);
asql> .explain
asql> explain select * from genes;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     13    0                    00  Start at 13
1     OpenRead       0     2     1     6              00  root=2 iDb=1; genes
2     Explain        2     0     0     SCAN TABLE genes  00
3     Rewind         0     12    0                    00
4       Rowid          0     1     0                    00  r[1]=rowid
5       Column         0     1     2                    00  r[2]=genes.name
6       Column         0     2     3                    00  r[3]=genes.f1
7       Column         0     3     4                    00  r[4]=genes.f2
8       Column         0     4     5                    00  r[5]=genes.f3
9       Column         0     5     6                    00  r[6]=genes.f4
10      ResultRow      1     6     0                    00  output=r[1..6]
11    Next           0     4     0                    01
12    Halt           0     0     0                    00
13    Transaction    1     0     1     0              01  usesStmtJournal=0
14    Goto           0     1     0                    00

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Mitar
Gesendet: Donnerstag, 17. Oktober 2019 15:11
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table

Hi!

On Thu, Oct 17, 2019 at 3:04 PM Eric Grange <[hidden email]> wrote:
> my suggestion would be to store them as JSON in a blob, and use the
> JSON functions of SQLite to extract the data

JSON has some crazy limitations like by standard it does not support full floating point spec, so NaN and infinity cannot be represented there. So JSON is really no a great format when you want to preserve as much of the input as possible (like, integers, floats, text, and binary). SQLite seems to be spot on in this regard.

But yes, if there would be some other standard to SQLite and supported format to embed, that approach would be useful. Like composite value types.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Limit on number of columns in SQLite table

Mitar
 Hi!

Thanks for this input. So you are saying that sqlite3_column 100k
times per row is slow, but retrieving 100k rows to construct one
"original" row will be faster? So not sure if I understand why reading
and decoding cells in over multiple columns is so much slower than
reading and decoding cells in over multiple rows?

Mitar

On Thu, Oct 17, 2019 at 3:38 PM Hick Gunter <[hidden email]> wrote:

>
> I have the impression that you still do not grasp the folly of a 100k column schema.
>
> See the example below, which only has 6 fields. As you can see, each field requires a Column opcode and arguments (about 10 bytes) and a "register" to hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to retrieve a row from the database. It ill also involve SQLite decoding 100k field values and your application calling sqlite3_column interface 100k times for each and every row, which yield an expected performance of about 2 rows per second. Can you afford to use that much memory and time?
>
> asql> create temp table genes (id integer primary key, name char, f1 char, f2 char, f3 char, f4 char);
> asql> .explain
> asql> explain select * from genes;
> addr  opcode         p1    p2    p3    p4             p5  comment
> ----  -------------  ----  ----  ----  -------------  --  -------------
> 0     Init           0     13    0                    00  Start at 13
> 1     OpenRead       0     2     1     6              00  root=2 iDb=1; genes
> 2     Explain        2     0     0     SCAN TABLE genes  00
> 3     Rewind         0     12    0                    00
> 4       Rowid          0     1     0                    00  r[1]=rowid
> 5       Column         0     1     2                    00  r[2]=genes.name
> 6       Column         0     2     3                    00  r[3]=genes.f1
> 7       Column         0     3     4                    00  r[4]=genes.f2
> 8       Column         0     4     5                    00  r[5]=genes.f3
> 9       Column         0     5     6                    00  r[6]=genes.f4
> 10      ResultRow      1     6     0                    00  output=r[1..6]
> 11    Next           0     4     0                    01
> 12    Halt           0     0     0                    00
> 13    Transaction    1     0     1     0              01  usesStmtJournal=0
> 14    Goto           0     1     0                    00
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Mitar
> Gesendet: Donnerstag, 17. Oktober 2019 15:11
> An: SQLite mailing list <[hidden email]>
> Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table
>
> Hi!
>
> On Thu, Oct 17, 2019 at 3:04 PM Eric Grange <[hidden email]> wrote:
> > my suggestion would be to store them as JSON in a blob, and use the
> > JSON functions of SQLite to extract the data
>
> JSON has some crazy limitations like by standard it does not support full floating point spec, so NaN and infinity cannot be represented there. So JSON is really no a great format when you want to preserve as much of the input as possible (like, integers, floats, text, and binary). SQLite seems to be spot on in this regard.
>
> But yes, if there would be some other standard to SQLite and supported format to embed, that approach would be useful. Like composite value types.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>  Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12