Performing a SELECT on an enormous database...

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Performing a SELECT on an enormous database...

Ross Altman-2
Hi guys,

I'm currently working with a pretty gigantic database (116 Gb at the
moment, and growing). Performing a simple SELECT routine with one filter
takes between 7 and 15 minutes, which is starting to become a problem. The
command I'm using is the following:

SELECT NVERTS FROM ToricCY WHERE H11=2;

Pretty basic. I just want the NVERTS column of the table ToricCY where
another column labeled H11 is 2, nothing fancy. Because of the huge size of
the database, I would expect it to take a while, but I was wondering if you
guys have any recommendations on how to speed it up (if possible).

Also, because this problem will only get worse as the database grows, I'm
looking for alternative approaches to storing this data. Does anyone have
experience working with databases this big?

Thanks!

Best,
Ross
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Alessandro Marzocchi
Do you have an index on that column?
Il 25/ott/2014 14:03 "Ross Altman" <[hidden email]> ha scritto:

> Hi guys,
>
> I'm currently working with a pretty gigantic database (116 Gb at the
> moment, and growing). Performing a simple SELECT routine with one filter
> takes between 7 and 15 minutes, which is starting to become a problem. The
> command I'm using is the following:
>
> SELECT NVERTS FROM ToricCY WHERE H11=2;
>
> Pretty basic. I just want the NVERTS column of the table ToricCY where
> another column labeled H11 is 2, nothing fancy. Because of the huge size of
> the database, I would expect it to take a while, but I was wondering if you
> guys have any recommendations on how to speed it up (if possible).
>
> Also, because this problem will only get worse as the database grows, I'm
> looking for alternative approaches to storing this data. Does anyone have
> experience working with databases this big?
>
> Thanks!
>
> Best,
> Ross
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Stephen Chrzanowski
In reply to this post by Ross Altman-2
Make sure H11 has an index on it.  Also ensure that ToricCY doesn't have
blobs attached to it.  Throw the blobs into a different table and use a 1:1
relationship to link them, then only pull the blobs when needed.

SQLite apparently will load an entire row of data out of the database, even
if the query doesn't need the additional data, so if you have a huge blob,
it'll read in that entire blob then toss the blob to /dev/null

You might also want to try and run an analyze on the database as well.
That MIGHT take a while (At 116gig, yeah, I'd say so -- Make sure you have
PLENTY of drive space where the DB lives), so, if it being used in a live
environment, you might need to take your package offline for maintenance.

At this point, throwing an index on the database WILL take some time as
well.  Moving the blobs from one table to another is also going to take a
long while as not only will you need to copy the data from one table to
another (So you have two copies of the database), but SQLite doesn't
support a ALTER TABLE DELETE FIELD type of statement, so, you'll have to
reconstruct the table (Or use an IDE that'll do that work for you) which
means a copy of your required data from one table to a new table, then a
delete of the old table.

On Fri, Oct 24, 2014 at 7:09 PM, Ross Altman <[hidden email]> wrote:

> Hi guys,
>
> I'm currently working with a pretty gigantic database (116 Gb at the
> moment, and growing). Performing a simple SELECT routine with one filter
> takes between 7 and 15 minutes, which is starting to become a problem. The
> command I'm using is the following:
>
> SELECT NVERTS FROM ToricCY WHERE H11=2;
>
> Pretty basic. I just want the NVERTS column of the table ToricCY where
> another column labeled H11 is 2, nothing fancy. Because of the huge size of
> the database, I would expect it to take a while, but I was wondering if you
> guys have any recommendations on how to speed it up (if possible).
>
> Also, because this problem will only get worse as the database grows, I'm
> looking for alternative approaches to storing this data. Does anyone have
> experience working with databases this big?
>
> Thanks!
>
> Best,
> Ross
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Teg-3
In reply to this post by Ross Altman-2
Hello Ross,

Is  H11  indexed?   15 minutes suggests to me that it's doing a linear
scan and you need an index.  I have databases that big and performance
is pretty decent most of the time.

I'm sure the boys here will ask you for a schema probably an "explain"
on the query.

C

Friday, October 24, 2014, 7:09:05 PM, you wrote:

RA> Hi guys,

RA> I'm currently working with a pretty gigantic database (116 Gb at the
RA> moment, and growing). Performing a simple SELECT routine with one filter
RA> takes between 7 and 15 minutes, which is starting to become a problem. The
RA> command I'm using is the following:

RA> SELECT NVERTS FROM ToricCY WHERE H11=2;

RA> Pretty basic. I just want the NVERTS column of the table ToricCY where
RA> another column labeled H11 is 2, nothing fancy. Because of the huge size of
RA> the database, I would expect it to take a while, but I was wondering if you
RA> guys have any recommendations on how to speed it up (if possible).

RA> Also, because this problem will only get worse as the database grows, I'm
RA> looking for alternative approaches to storing this data. Does anyone have
RA> experience working with databases this big?

RA> Thanks!

RA> Best,
RA> Ross
RA> _______________________________________________
RA> sqlite-users mailing list
RA> [hidden email]
RA> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



--
 Teg                            mailto:[hidden email]

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Robert Hairgrove
In reply to this post by Ross Altman-2
On Fri, 2014-10-24 at 19:09 -0400, Ross Altman wrote:
> I'm currently working with a pretty gigantic database (116 Gb at the
> moment, and growing). Performing a simple SELECT routine with one filter
> takes between 7 and 15 minutes, which is starting to become a problem. The
> command I'm using is the following:
>
> SELECT NVERTS FROM ToricCY WHERE H11=2;

Questions:

1) Are you querying a local database file, or pulling the data in over a
network connection?

2) Is there an index defined for the column H11?

> Pretty basic. I just want the NVERTS column of the table ToricCY where
> another column labeled H11 is 2, nothing fancy. Because of the huge size of
> the database, I would expect it to take a while, but I was wondering if you
> guys have any recommendations on how to speed it up (if possible).
>
> Also, because this problem will only get worse as the database grows, I'm
> looking for alternative approaches to storing this data. Does anyone have
> experience working with databases this big?


116 GB seems to be at odds with a database system named "SQLite". :) But
most popular client/server RDBMS's should be able to handle it with no
problem. MySQL, for example, supports table partitions which can have an
enormous impact on performance. With SQLite, you can approximate
partitioning by splitting the database into several smaller ones and use
the ATTACH command to run queries over several databases at once.

Hope this helps!

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Richard Hipp-3
In reply to this post by Stephen Chrzanowski
On Sat, Oct 25, 2014 at 8:15 AM, Stephen Chrzanowski <[hidden email]>
wrote:

>
> SQLite apparently will load an entire row of data out of the database, even
> if the query doesn't need the additional data, so if you have a huge blob,
> it'll read in that entire blob then toss the blob to /dev/null
>

SQLite only loads a prefix of each row from the file - the minimum prefix
required to resolve the query.  So if the BLOB is out on the end of the
row, it does not get loaded.

However, when updating a row, SQLite rewrites the entire row.  (It has to,
because of the use of variable-width encodings, since a change to any field
effects the location of all subsequent fields.)  So if you have a row with
both a BOOLEAN and a 1MB BLOB, you have to write 1MB in order to change the
value of the BOOLEAN.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Stephen Chrzanowski
Ahh.. Thanks Richard.  So if you were to have blobs live at the front of
the row, it'll have to read through that blob to get that byte in the next
field, correct?

On Sat, Oct 25, 2014 at 8:31 AM, Richard Hipp <[hidden email]> wrote:

> On Sat, Oct 25, 2014 at 8:15 AM, Stephen Chrzanowski <[hidden email]>
> wrote:
>
> >
> > SQLite apparently will load an entire row of data out of the database,
> even
> > if the query doesn't need the additional data, so if you have a huge
> blob,
> > it'll read in that entire blob then toss the blob to /dev/null
> >
>
> SQLite only loads a prefix of each row from the file - the minimum prefix
> required to resolve the query.  So if the BLOB is out on the end of the
> row, it does not get loaded.
>
> However, when updating a row, SQLite rewrites the entire row.  (It has to,
> because of the use of variable-width encodings, since a change to any field
> effects the location of all subsequent fields.)  So if you have a row with
> both a BOOLEAN and a 1MB BLOB, you have to write 1MB in order to change the
> value of the BOOLEAN.
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Richard Hipp-3
On Sat, Oct 25, 2014 at 9:14 AM, Stephen Chrzanowski <[hidden email]>
wrote:

> Ahh.. Thanks Richard.  So if you were to have blobs live at the front of
> the row, it'll have to read through that blob to get that byte in the next
> field, correct?
>
>
Correct.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Ross Altman
Thanks for all the responses. The small integer column H11 comes before the
large string column NVERTS, so doesn't that mean SQLite is only loading the
minimum required while filtering? If that's the case then I don't
understand why it's taking up to 15 minutes to load.

Also, yes this database is local, and not being read over a network.

Best,
Ross

On Sat, Oct 25, 2014 at 9:15 AM, Richard Hipp <[hidden email]> wrote:

> On Sat, Oct 25, 2014 at 9:14 AM, Stephen Chrzanowski <[hidden email]>
> wrote:
>
> > Ahh.. Thanks Richard.  So if you were to have blobs live at the front of
> > the row, it'll have to read through that blob to get that byte in the
> next
> > field, correct?
> >
> >
> Correct.
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Simon Slavin-3

On 25 Oct 2014, at 9:07pm, Ross Altman <[hidden email]> wrote:

> Thanks for all the responses. The small integer column H11 comes before the
> large string column NVERTS, so doesn't that mean SQLite is only loading the
> minimum required while filtering? If that's the case then I don't
> understand why it's taking up to 15 minutes to load.

The searching is taking the time.  Because, as a number of other people have pointed out, there is no index on the H11 column, so SQLite has to look at every row in the table to see whether it qualifies for your SELECT.  Create an index by doing something like

CREATE INDEX ToricCY_H11 ON ToricCY (H11)

then do as many SELECTs as you want.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Ross Altman
Oh, I see! Thanks for the clarification, Simon! I wasn't aware that there
was a command for indexing. That should definitely help a lot.

On Sat, Oct 25, 2014 at 4:32 PM, Simon Slavin <[hidden email]> wrote:

>
> On 25 Oct 2014, at 9:07pm, Ross Altman <[hidden email]> wrote:
>
> > Thanks for all the responses. The small integer column H11 comes before
> the
> > large string column NVERTS, so doesn't that mean SQLite is only loading
> the
> > minimum required while filtering? If that's the case then I don't
> > understand why it's taking up to 15 minutes to load.
>
> The searching is taking the time.  Because, as a number of other people
> have pointed out, there is no index on the H11 column, so SQLite has to
> look at every row in the table to see whether it qualifies for your
> SELECT.  Create an index by doing something like
>
> CREATE INDEX ToricCY_H11 ON ToricCY (H11)
>
> then do as many SELECTs as you want.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Charles Samuels-2
In reply to this post by Richard Hipp-3

Richard,

On Saturday, October 25, 2014 05:31:35 AM Richard Hipp wrote:
> However, when updating a row, SQLite rewrites the entire row.  (It has to,
> because of the use of variable-width encodings, since a change to any field
> effects the location of all subsequent fields.)  So if you have a row with
> both a BOOLEAN and a 1MB BLOB, you have to write 1MB in order to change the
> value of the BOOLEAN.

Does this still apply if the column was added due to "alter table X add
column"? I ask because it was my understanding that alter table added the
extra column "elsewhere". It seems as a workaround, you could create a table
with some metadata, then add each huge blob at the end of it with alter table.

Charles
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Clemens Ladisch
Charles Samuels wrote:
> On Saturday, October 25, 2014 05:31:35 AM Richard Hipp wrote:
>> However, when updating a row, SQLite rewrites the entire row.
>
> Does this still apply if the column was added due to "alter table X add
> column"? I ask because it was my understanding that alter table added the
> extra column "elsewhere".

It adds the extra column "nowhere".  When SQLite reads a table row has
fewer columns than in the CREATE TABLE statement, the remaining columns'
values are assumed to have their default values.

When a row is (re)written, all columns are written (even those that
happen to have default values).


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

J Decker
On Sun, Oct 26, 2014 at 10:32 AM, Clemens Ladisch <[hidden email]>
wrote:

> Charles Samuels wrote:
> > On Saturday, October 25, 2014 05:31:35 AM Richard Hipp wrote:
> >> However, when updating a row, SQLite rewrites the entire row.
> >
> > Does this still apply if the column was added due to "alter table X add
> > column"? I ask because it was my understanding that alter table added the
> > extra column "elsewhere".
>
> It adds the extra column "nowhere".  When SQLite reads a table row has
> fewer columns than in the CREATE TABLE statement, the remaining columns'
> values are assumed to have their default values.
>
>
That is interesting; so if I alter the default values, all rows that
existed before the column added and the default change get the changed
value? .


> When a row is (re)written, all columns are written (even those that
> happen to have default values).
>
>
> Regards,
> Clemens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Clemens Ladisch
J Decker wrote:

> On Sun, Oct 26, 2014 at 10:32 AM, Clemens Ladisch wrote:
>> Charles Samuels wrote:
>>> it was my understanding that alter table added the extra column "elsewhere".
>>
>> It adds the extra column "nowhere".  When SQLite reads a table row has
>> fewer columns than in the CREATE TABLE statement, the remaining columns'
>> values are assumed to have their default values.
>
> That is interesting; so if I alter the default values, all rows that
> existed before the column added and the default change get the changed
> value?

The ALTER TABLE command cannot change the default values, so this cannot
happen.


... well, "cannot" is relative:

  $ sqlite3 test.db
  sqlite> CREATE TABLE t(x);
  sqlite> INSERT INTO t(x) VALUES (1);
  sqlite> ALTER TABLE t ADD y DEFAULT 42;
  sqlite> INSERT INTO t(x) VALUES (2);
  sqlite> SELECT * FROM t;
  1|42
  2|42
  sqlite> PRAGMA writable_schema = on;
  sqlite> UPDATE sqlite_master SET sql = 'CREATE TABLE t(x, y DEFAULT 666)' WHERE name = 't' AND type = 'table';
  sqlite> ^D
  $ sqlite3 test.db
  sqlite> SELECT * FROM t;
  1|666
  2|42

(The documentation of PRAGMA writable_schema says:  "Warning: misuse of
this pragma can easily result in a corrupt database file."  Well, there
you have your corruption.)


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performing a SELECT on an enormous database...

Stephen Chrzanowski
In reply to this post by J Decker
If you're asking "if a field on a row contains a value that was previously
defined a value due to the DEFAULT value in the schema, would this value be
changed if the in the schema later changed", then no, it doesn't change.
There is no reference to the default value in the schema once the data has
been inserted.  So if you have a DEFAULT 'ABC', insert 100 rows, then
change the schema to DEFAULT 'DEF', then the 100 rows would still contain
'ABC'.  If you add a new field to the database with a DEFAULT 'XYZ' then
that new field in ALL existing rows would contain 'XYZ' but 'ABC' or 'DEF'
would still exist.

The action SQLite takes when inserting physical data into the field is
based on what the current DEFAULT value is in the schema.  If you have two
fields, one that has to be defined on an insert and one with a default
value, SQLite will basically change this:

*insert into MyTable (Letters) values ('ABC')*

into

*insert into MyTable (Letters, DefaultsToABC) values ('ABC','ABC')*

Think of it this way;

If you had a table with a default value, and you go to enter a value
manually that just happens to match the schemas default value, it wouldn't
make sense to change the value of that value in an existing row if you
change the default value in the schema.  Based on the output of a SQL
command, you wouldn't be able to tell the difference if 'ABC' is the
default value or if 'ABC' was manually entered.  So if you go and change
the value of the default and SQLite DID go and change the default values,
you'd get confused to why you have some rows with 'ABC' and some with 'DEF'.

On Sun, Oct 26, 2014 at 1:40 PM, J Decker <[hidden email]> wrote:

>
> That is interesting; so if I alter the default values, all rows that
> existed before the column added and the default change get the changed
> value? .
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users