Best way to "split" a large DB into two DBs

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way to "split" a large DB into two DBs

Dominique Devienne
Imagine there's a 25GB SQLite DB file in version v1 of an application.

Because the way this DB is structured, with large to very large blobs, this
leads to problems for the app. That DB has 35 tables, but 5 of those
represent 95% of the DB size (because of those blobs).

So for v2 of the app, there's a new design where the 30 "lightweight"
tables move into a new DB file (with additional new tables), and the 5
"blob heavy" tables remain in the existing DB file (but the 30 "moved"
tables should be dropped from it).

Initial design was to copy the DB file (app is "shutdown", so no connection
to that DB file).
Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
But the 25GB copy was deemed too long in that case (several minutes).
(after all, copying 95% of 25GB to discard all those GBs is inefficient)

So alternate design was to create the schema empty in the new DB file,
attach the old one, and insert the data from the 30 tables into the new
empty (with identical schema/structure) tables. But that's also very slow
apparently.

Even though it's not my app, I know there are indexes and triggers on those
tables (but the triggers are mostly on deletes, so don't really apply
here), and I suggested adding those after the selects, but I doubt it's
going to make a dramatic difference time-wise.

Conceptually, the ideal situation would be to just copy the DB header,
sqlite_master page(s) (dropping the 5 big tables definitions and related
indexes/triggers from sqlite_master), and only the 5% of pages related to
the other tables (and related objects).  (chaining of pages and root pages
in sqlite_master have to be updated of course). It's almost like a form of
VACCUM, except it's not in-place and works on a subset of the tables.
IO-wise, that has the potential to be 20x faster I imagine.

But of course there's no such "out-of-place" "partial" VACCUM...

Then I though maybe .backup, which also works at the page level (I
believe), perhaps can subset what tables to backup. But no luck there
either. backup works for the whole DB, not a subset.

Am I missing other ways to achieve this "split" efficiently?
Any chance the backup API could group table-specific (and related objects)
alternate form?

Any advice would be appreciated. Thanks, --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to "split" a large DB into two DBs

Stephen Chrzanowski
What are the business logic reasons you want to split the database?  Is
your application going to be able to handle dealing with data that'd exist
in multiple places?

When I know I'm going to be putting large blobs within a database, those
blob tables have two fields, one ID, and one for the actual content.
Perhaps even a field to indicate what kind of blob it is, but, I'm also the
type that'd just make a new table for the new type of blob information.

The ID is a 1:1 PK:FK relationship between a field in the "master" table
and this blob data, so the master table would have a direct reference to
the blob record.  This way, whenever I need to do queries, I pull from the
master table, and IF I need to get content about the blob (Say image, MP3,
or whatever the case) I'll open a new query to the blob table, get my data,
then shut it down. I'd also set a unique constraint on the blob ID field as
well, which automatically creates an index against it for fast lookups.


On Wed, Sep 7, 2016 at 12:08 PM, Dominique Devienne <[hidden email]>
wrote:

> Imagine there's a 25GB SQLite DB file in version v1 of an application.
>
> Because the way this DB is structured, with large to very large blobs, this
> leads to problems for the app. That DB has 35 tables, but 5 of those
> represent 95% of the DB size (because of those blobs).
>
> So for v2 of the app, there's a new design where the 30 "lightweight"
> tables move into a new DB file (with additional new tables), and the 5
> "blob heavy" tables remain in the existing DB file (but the 30 "moved"
> tables should be dropped from it).
>
> Initial design was to copy the DB file (app is "shutdown", so no connection
> to that DB file).
> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
> But the 25GB copy was deemed too long in that case (several minutes).
> (after all, copying 95% of 25GB to discard all those GBs is inefficient)
>
> So alternate design was to create the schema empty in the new DB file,
> attach the old one, and insert the data from the 30 tables into the new
> empty (with identical schema/structure) tables. But that's also very slow
> apparently.
>
> Even though it's not my app, I know there are indexes and triggers on those
> tables (but the triggers are mostly on deletes, so don't really apply
> here), and I suggested adding those after the selects, but I doubt it's
> going to make a dramatic difference time-wise.
>
> Conceptually, the ideal situation would be to just copy the DB header,
> sqlite_master page(s) (dropping the 5 big tables definitions and related
> indexes/triggers from sqlite_master), and only the 5% of pages related to
> the other tables (and related objects).  (chaining of pages and root pages
> in sqlite_master have to be updated of course). It's almost like a form of
> VACCUM, except it's not in-place and works on a subset of the tables.
> IO-wise, that has the potential to be 20x faster I imagine.
>
> But of course there's no such "out-of-place" "partial" VACCUM...
>
> Then I though maybe .backup, which also works at the page level (I
> believe), perhaps can subset what tables to backup. But no luck there
> either. backup works for the whole DB, not a subset.
>
> Am I missing other ways to achieve this "split" efficiently?
> Any chance the backup API could group table-specific (and related objects)
> alternate form?
>
> Any advice would be appreciated. Thanks, --DD
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to "split" a large DB into two DBs

Dan Kennedy-4
In reply to this post by Dominique Devienne
On 09/07/2016 11:08 PM, Dominique Devienne wrote:

> Imagine there's a 25GB SQLite DB file in version v1 of an application.
>
> Because the way this DB is structured, with large to very large blobs, this
> leads to problems for the app. That DB has 35 tables, but 5 of those
> represent 95% of the DB size (because of those blobs).
>
> So for v2 of the app, there's a new design where the 30 "lightweight"
> tables move into a new DB file (with additional new tables), and the 5
> "blob heavy" tables remain in the existing DB file (but the 30 "moved"
> tables should be dropped from it).
>
> Initial design was to copy the DB file (app is "shutdown", so no connection
> to that DB file).
> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
> But the 25GB copy was deemed too long in that case (several minutes).
> (after all, copying 95% of 25GB to discard all those GBs is inefficient)
>
> So alternate design was to create the schema empty in the new DB file,
> attach the old one, and insert the data from the 30 tables into the new
> empty (with identical schema/structure) tables. But that's also very slow
> apparently.

SQLite contains a special optimization for statements of the form:

   INSERT INTO tbl SELECT * FROM tbl2;

If both tables have the same set of columns, the same PK, the same
indexes and identical UNIQUE constraints and no triggers, and if foreign
keys are disabled, SQLite can copy records directly from one b-tree
structure to another without unpacking the records. This can be much
faster. And it writes the b-tree structures in order too, which leads to
a more efficient use of the cache.

So if you create your new db with the tables and indexes but no
triggers, then attach it and your main db to the same handle so that you
can use an "INSERT INTO ... SELECT ..." statement of the form above,
things might run a bit faster.

Dan.




>
> Even though it's not my app, I know there are indexes and triggers on those
> tables (but the triggers are mostly on deletes, so don't really apply
> here), and I suggested adding those after the selects, but I doubt it's
> going to make a dramatic difference time-wise.
>
> Conceptually, the ideal situation would be to just copy the DB header,
> sqlite_master page(s) (dropping the 5 big tables definitions and related
> indexes/triggers from sqlite_master), and only the 5% of pages related to
> the other tables (and related objects).  (chaining of pages and root pages
> in sqlite_master have to be updated of course). It's almost like a form of
> VACCUM, except it's not in-place and works on a subset of the tables.
> IO-wise, that has the potential to be 20x faster I imagine.
>
> But of course there's no such "out-of-place" "partial" VACCUM...
>
> Then I though maybe .backup, which also works at the page level (I
> believe), perhaps can subset what tables to backup. But no luck there
> either. backup works for the whole DB, not a subset.
>
> Am I missing other ways to achieve this "split" efficiently?
> Any chance the backup API could group table-specific (and related objects)
> alternate form?
>
> Any advice would be appreciated. Thanks, --DD
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to "split" a large DB into two DBs

Clay Gerrard
On Wed, Sep 7, 2016 at 10:08 AM, Dan Kennedy <[hidden email]> wrote:

>
> [...] then attach it and your main db to the same handle so that you can
> use an "INSERT INTO ... SELECT ..." statement


Can you elaborate or provide any reference materials for this "attach it
and your main db to the same handle" procedure?  Very interesting!?

-Clay
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to "split" a large DB into two DBs

Simon Slavin-3

On 7 Sep 2016, at 6:57pm, Clay Gerrard <[hidden email]> wrote:

> Can you elaborate or provide any reference materials for this "attach it
> and your main db to the same handle" procedure?

<https://www.sqlite.org/lang_attach.html>

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to "split" a large DB into two DBs

Dan Kennedy-4
In reply to this post by Clay Gerrard
On 09/08/2016 12:57 AM, Clay Gerrard wrote:
> On Wed, Sep 7, 2016 at 10:08 AM, Dan Kennedy <[hidden email]> wrote:
>
>> [...] then attach it and your main db to the same handle so that you can
>> use an "INSERT INTO ... SELECT ..." statement
>
> Can you elaborate or provide any reference materials for this "attach it
> and your main db to the same handle" procedure?  Very interesting!?

Use the ATTACH SQL command:

   http://sqlite.org/draft/lang_attach.html

If you're second db is "x.db" you can do something like:

   ATTACH 'x.db' AS aux;

and then:

   SELECT * FROM aux.sqlite_master;

and so on.

Dan.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to "split" a large DB into two DBs

Eduardo Morras-2
In reply to this post by Dominique Devienne
On Wed, 7 Sep 2016 18:08:50 +0200
Dominique Devienne <[hidden email]> wrote:

> Imagine there's a 25GB SQLite DB file in version v1 of an application.
>
> Because the way this DB is structured, with large to very large
> blobs, this leads to problems for the app. That DB has 35 tables, but
> 5 of those represent 95% of the DB size (because of those blobs).
>
> So for v2 of the app, there's a new design where the 30 "lightweight"
> tables move into a new DB file (with additional new tables), and the 5
> "blob heavy" tables remain in the existing DB file (but the 30 "moved"
> tables should be dropped from it).
>
> Initial design was to copy the DB file (app is "shutdown", so no
> connection to that DB file).
> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
> But the 25GB copy was deemed too long in that case (several minutes).
> (after all, copying 95% of 25GB to discard all those GBs is
> inefficient)
>
> So alternate design was to create the schema empty in the new DB file,
> attach the old one, and insert the data from the 30 tables into the
> new empty (with identical schema/structure) tables. But that's also
> very slow apparently.
>
> Even though it's not my app, I know there are indexes and triggers on
> those tables (but the triggers are mostly on deletes, so don't really
> apply here), and I suggested adding those after the selects, but I
> doubt it's going to make a dramatic difference time-wise.
>
> Conceptually, the ideal situation would be to just copy the DB header,
> sqlite_master page(s) (dropping the 5 big tables definitions and
> related indexes/triggers from sqlite_master), and only the 5% of
> pages related to the other tables (and related objects).  (chaining
> of pages and root pages in sqlite_master have to be updated of
> course). It's almost like a form of VACCUM, except it's not in-place
> and works on a subset of the tables. IO-wise, that has the potential
> to be 20x faster I imagine.
>
> But of course there's no such "out-of-place" "partial" VACCUM...
>
> Then I though maybe .backup, which also works at the page level (I
> believe), perhaps can subset what tables to backup. But no luck there
> either. backup works for the whole DB, not a subset.
>
> Am I missing other ways to achieve this "split" efficiently?
> Any chance the backup API could group table-specific (and related
> objects) alternate form?

Not answering your question, but some comments/tricks about how boost sqlite3
performance on your scenario.

First, compile last sqlite3 version with these options on

SQLITE_DIRECT_OVERFLOW_READ
SQLITE_DEFAULT_AUTOVACUUM=2

They are described at https://www.sqlite.org/compile.html#direct_overflow_read 
and https://www.sqlite.org/compile.html#omit_autovacuum, don't omit
autovacuum, but set it to 2 or incremental.

For an explanation about why setting autovacuum incremental works see
http://marc.info/?l=sqlite-users&m=136265346522617&w=4

If your blob size is high, bigger than 500KB, set page size to 64KB.

Create the blob column the last one.

Compile with dbstat (SQLITE_ENABLE_DBSTAT_VTAB) option if you want statistics
about your blob db. You can get information about fragmentation, etc, see
https://www.sqlite.org/dbstat.html#section_3

If your use case is write heavy, don't use wal, modifications (CUD) are not
atomic when use attached dbs and wal mode. If your use case is read heavy, use wal.

I use (and develop) a diy filesystem over sqlite and has a similar scenario
with hundreds of GB of blobs on multiple databases. Those compile options, tricks
and split metadata from data made a huge improvement.

> Any advice would be appreciated. Thanks, --DD

HTH

---   ---
Eduardo Morras <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Best way to "split" a large DB into two DBs

Dominique Devienne
In reply to this post by Dan Kennedy-4
On Wed, Sep 7, 2016 at 7:08 PM, Dan Kennedy <[hidden email]> wrote:

> On 09/07/2016 11:08 PM, Dominique Devienne wrote:
>
>> Initial design was to copy the DB file (app is "shutdown", so no
>> connection
>> to that DB file).
>> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
>> But the 25GB copy was deemed too long in that case (several minutes).
>> (after all, copying 95% of 25GB to discard all those GBs is inefficient)
>>
>
I've actually tried it now, with a 46 GB DB, and that took ~ 40min on my
old Linux box.
25min copy, 11.5min drop (with TX), and 3.5min vacum, to go down to a 1.5
GB DB.


> So alternate design was to create the schema empty in the new DB file,
>> attach the old one, and insert the data from the 30 tables into the new
>> empty (with identical schema/structure) tables. But that's also very slow
>> apparently.
>>
>
> SQLite contains a special optimization for statements of the form:
>
>   INSERT INTO tbl SELECT * FROM tbl2;
>
> If both tables have the same set of columns, the same PK, the same indexes
> and identical UNIQUE constraints and no triggers, and if foreign keys are
> disabled, SQLite can copy records directly from one b-tree structure to
> another without unpacking the records. This can be much faster. And it
> writes the b-tree structures in order too, which leads to a more efficient
> use of the cache.
>
> So if you create your new db with the tables and indexes but no triggers,
> then attach it and your main db to the same handle so that you can use an
> "INSERT INTO ... SELECT ..." statement of the form above, things might run
> a bit faster.


And this approach took 1min11s, much faster!
The DB file also happened to be a little smaller, 1.45 GB.

Dan, how does one know whether this optimization kicks in or not???
Any way to know it, using an explain (query) plan for example?

Note that I later learned they were reordering columns too, so it wasn't a
    insert into SomeTable select * from old.SomeTable
like in my test (with decent enough performance).
Thus I suspect that's why it was too slow for them (but it's just a guess).

In any case, thanks for your answer Dan. --DD

PS: Here's my script

.timer ON
attach 'DB.db' as old;
BEGIN;
insert into SomeTable select * from old.SomeTable;
...
COMMIT;

and here's the output:

[ddevienne]$ time sqlite3 DB-new.db < DB-new-insert-from.sql
Run Time: real 0.038 user 0.001999 sys 0.001000 (BEGIN)
Run Time: real 0.000 user 0.000000 sys 0.000000 (INSERT)
Run Time: real 0.001 user 0.000000 sys 0.000000 ...
Run Time: real 0.000 user 0.000000 sys 0.001000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.018 user 0.000000 sys 0.000999
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.299 user 0.126981 sys 0.017998
Run Time: real 0.001 user 0.000000 sys 0.000000
Run Time: real 0.012 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.001000
Run Time: real 0.018 user 0.000000 sys 0.000000
Run Time: real 0.010 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.098 user 0.042993 sys 0.018997
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.001000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.015 user 0.006999 sys 0.001000
Run Time: real 0.009 user 0.000000 sys 0.000000
Run Time: real 0.009 user 0.001000 sys 0.000999
Run Time: real 69.194 user 25.081187 sys 4.877259
Run Time: real 0.021 user 0.003000 sys 0.001000
Run Time: real 0.001 user 0.000000 sys 0.000000 (INSERT)
Run Time: real 1.457 user 0.000999 sys 0.017997 (COMMIT)
25.267u 4.944s 1:11.21 42.4%    0+0k 3126056+2868824io 0pf+0w

One table takes up 99% of the new DB, with 27M rows, so that explains the
69s I guess.

*** Table FOO and all its indices ****************************

Percentage of total database......................  99.0%
Number of entries................................. 27437811
Bytes of storage consumed......................... 1449852928
Bytes of payload.................................. 1315073619  90.7%
Average payload per entry......................... 47.93
Average unused bytes per entry.................... 0.43
Average fanout.................................... 150.00
Maximum payload per entry......................... 596
Entries that use overflow......................... 0            0.0%
Index pages used.................................. 2352
Primary pages used................................ 351616
Overflow pages used............................... 0
Total pages used.................................. 353968
Unused bytes on index pages....................... 334021       3.5%
Unused bytes on primary pages..................... 11385163     0.79%
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 11719184     0.81%
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users