Copy-on-write VFS

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Copy-on-write VFS

Fredrik Larsen
Hi

A copy-on-write IO-path where data is split into static and dynamic parts
(think snapshots for storage) would be very helpful for our project, . This
would simplify backups, testing, moving data around in a multinode
environment, etc.

Does something like this exist for sqlite? In my head this sounds like an
relative easy feature to add as IO-stuff is already centralized in the VFS
layer. Maybe a new COW-VFS?

Fredrik
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

test user
Hello Fredrik,

Why does it need to be part of a VFS instead of using a file system with
COW like ZFS?

On Fri, 4 Oct 2019 at 12:18, Fredrik Larsen <[hidden email]> wrote:

> Hi
>
> A copy-on-write IO-path where data is split into static and dynamic parts
> (think snapshots for storage) would be very helpful for our project, . This
> would simplify backups, testing, moving data around in a multinode
> environment, etc.
>
> Does something like this exist for sqlite? In my head this sounds like an
> relative easy feature to add as IO-stuff is already centralized in the VFS
> layer. Maybe a new COW-VFS?
>
> Fredrik
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Simon Slavin-3
In reply to this post by Fredrik Larsen
On 4 Oct 2019, at 12:17pm, Fredrik Larsen <[hidden email]> wrote:

> A copy-on-write IO-path where data is split into static and dynamic parts (think snapshots for storage) would be very helpful for our project.

SQLite abstracts changes-only tracking at the transaction level, and provides it as the session extension:

<https://www.sqlite.org/sessionintro.html>

You can save your changeset BLOB however you want and do anything you want to it including apply it to different databases on different computers.  A BLOB is just a sequence of octets.  Handle it however you want.

Providing the same feature at VFS level would be far more difficult because different databases with the same data can be laid out differently in different files on different computers.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Keith Medcalf
In reply to this post by Fredrik Larsen
On Friday, 4 October, 2019 05:18, Fredrik Larsen <[hidden email]> wrote:

>A copy-on-write IO-path where data is split into static and dynamic parts
>(think snapshots for storage) would be very helpful for our project, .

What do you mean?  Useful how?

>This would simplify backups, testing, moving data around in a multinode
>environment, etc.

Since we still do not know what you are on about, then this is debatable.

>Does something like this exist for sqlite? In my head this sounds like an
>relative easy feature to add as IO-stuff is already centralized in the
>VFS layer. Maybe a new COW-VFS?

Still do not know exactly what you are on about.  Can you explain what you are on about?

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Roman Fleysher
Maybe this for this scenario:

You have in-memory database, used mostly for reading and you want to save its copy to disk when update on in-memory is performed?
Otherwise, what is copied and what is to write in  "copy on write"?



________________________________________
From: sqlite-users [[hidden email]] on behalf of Keith Medcalf [[hidden email]]
Sent: Friday, October 04, 2019 12:19 PM
To: SQLite mailing list
Subject: Re: [sqlite] Copy-on-write VFS

On Friday, 4 October, 2019 05:18, Fredrik Larsen <[hidden email]> wrote:

>A copy-on-write IO-path where data is split into static and dynamic parts
>(think snapshots for storage) would be very helpful for our project, .

What do you mean?  Useful how?

>This would simplify backups, testing, moving data around in a multinode
>environment, etc.

Since we still do not know what you are on about, then this is debatable.

>Does something like this exist for sqlite? In my head this sounds like an
>relative easy feature to add as IO-stuff is already centralized in the
>VFS layer. Maybe a new COW-VFS?

Still do not know exactly what you are on about.  Can you explain what you are on about?

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&amp;data=02%7C01%7Croman.fleysher%40einstein.yu.edu%7C9ef1d96d7e3840552f4208d748e6ad65%7C04c70eb48f2648079934e02e89266ad0%7C1%7C1%7C637058027904564324&amp;sdata=8hS93ORYmBXeRHKt4bF4di3AOQswyKWSzvjIGPInDBY%3D&amp;reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Fredrik Larsen
In reply to this post by test user
A file-system with COW support would work, but that is a big
dependency/constraint to bring into a project, and not always
possible/practical. A file based version (snapshot + changes) will be more
practical and easier to manage, and also very doable IMHO.

Anyway, I was just wondering if anyone else had explored this path. From
the feedback so far it seems not.

Fredrik

On Fri, Oct 4, 2019 at 3:23 PM test user <[hidden email]>
wrote:

> Hello Fredrik,
>
> Why does it need to be part of a VFS instead of using a file system with
> COW like ZFS?
>
> On Fri, 4 Oct 2019 at 12:18, Fredrik Larsen <[hidden email]> wrote:
>
> > Hi
> >
> > A copy-on-write IO-path where data is split into static and dynamic parts
> > (think snapshots for storage) would be very helpful for our project, .
> This
> > would simplify backups, testing, moving data around in a multinode
> > environment, etc.
> >
> > Does something like this exist for sqlite? In my head this sounds like an
> > relative easy feature to add as IO-stuff is already centralized in the
> VFS
> > layer. Maybe a new COW-VFS?
> >
> > Fredrik
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Fredrik Larsen
In reply to this post by Simon Slavin-3
Applying the dump from session extension will modify the db, so you need to
copy all data to the node that will run sqlite, then apply changes, then
start db. A COW system would allow you to to store the bulk of the data in
a shared and network-mounted folder, and only copy over the changes since
the snapshot was made. For us this could easily reduce copy-work several
orders of magnitude. It is also easy to roll back any changes since
snapshot was made by simply deleting files. This is relevant when running
tests or working on new product features. Also, the snapshot-data would be
read only, simplifying backups and the effect of data-corrupting errors.

Fredrik

On Fri, Oct 4, 2019 at 3:44 PM Simon Slavin <[hidden email]> wrote:

> On 4 Oct 2019, at 12:17pm, Fredrik Larsen <[hidden email]> wrote:
>
> > A copy-on-write IO-path where data is split into static and dynamic
> parts (think snapshots for storage) would be very helpful for our project.
>
> SQLite abstracts changes-only tracking at the transaction level, and
> provides it as the session extension:
>
> <https://www.sqlite.org/sessionintro.html>
>
> You can save your changeset BLOB however you want and do anything you want
> to it including apply it to different databases on different computers.  A
> BLOB is just a sequence of octets.  Handle it however you want.
>
> Providing the same feature at VFS level would be far more difficult
> because different databases with the same data can be laid out differently
> in different files on different computers.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Keith Medcalf
In reply to this post by Fredrik Larsen

You still have not explained what you think COW is and of what possible use it could be.

If you want a "snapshot + changes" then why not just enable WAL mode and disable checkpointing?

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Fredrik Larsen
>Sent: Friday, 4 October, 2019 14:14
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] Copy-on-write VFS
>
>A file-system with COW support would work, but that is a big
>dependency/constraint to bring into a project, and not always
>possible/practical. A file based version (snapshot + changes) will be
>more
>practical and easier to manage, and also very doable IMHO.
>
>Anyway, I was just wondering if anyone else had explored this path. From
>the feedback so far it seems not.
>
>Fredrik
>
>On Fri, Oct 4, 2019 at 3:23 PM test user <[hidden email]>
>wrote:
>
>> Hello Fredrik,
>>
>> Why does it need to be part of a VFS instead of using a file system
>with
>> COW like ZFS?
>>
>> On Fri, 4 Oct 2019 at 12:18, Fredrik Larsen <[hidden email]> wrote:
>>
>> > Hi
>> >
>> > A copy-on-write IO-path where data is split into static and dynamic
>parts
>> > (think snapshots for storage) would be very helpful for our project,
>.
>> This
>> > would simplify backups, testing, moving data around in a multinode
>> > environment, etc.
>> >
>> > Does something like this exist for sqlite? In my head this sounds
>like an
>> > relative easy feature to add as IO-stuff is already centralized in
>the
>> VFS
>> > layer. Maybe a new COW-VFS?
>> >
>> > Fredrik
>> > _______________________________________________
>> > sqlite-users mailing list
>> > [hidden email]
>> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Fredrik Larsen
Interesting, I previously just assumed that WAL file stored a log of
sql-like-update-statements since last checkpoint, making a large WAL-file
effectivly unusable if you have any requirements on query-performance.

But re-reading the WAL-documentation, there are references to "pages" being
stored at the end of the WAL-log, and a index containing some mapping. This
smells alot like a COW-implementation, where modified pages/blocks of the
database are copied then modfied and written to a change-file, and using an
index to map old-pages to new-pages. Also, re-reading documentation about
query-performance, nowhere does it warn about severe regression with
unbounded WAL-size, as I assumed, further supporting that WAL is in fact
implemented using COW.

Given that we disable ceckpointing, can we assume that the main
database-file will never be modified, and therefor could potentially be
mounted read-only?

Fredrik

On Fri, Oct 4, 2019 at 11:18 PM Keith Medcalf <[hidden email]> wrote:

>
> You still have not explained what you think COW is and of what possible
> use it could be.
>
> If you want a "snapshot + changes" then why not just enable WAL mode and
> disable checkpointing?
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users <[hidden email]> On
> >Behalf Of Fredrik Larsen
> >Sent: Friday, 4 October, 2019 14:14
> >To: SQLite mailing list <[hidden email]>
> >Subject: Re: [sqlite] Copy-on-write VFS
> >
> >A file-system with COW support would work, but that is a big
> >dependency/constraint to bring into a project, and not always
> >possible/practical. A file based version (snapshot + changes) will be
> >more
> >practical and easier to manage, and also very doable IMHO.
> >
> >Anyway, I was just wondering if anyone else had explored this path. From
> >the feedback so far it seems not.
> >
> >Fredrik
> >
> >On Fri, Oct 4, 2019 at 3:23 PM test user <[hidden email]>
> >wrote:
> >
> >> Hello Fredrik,
> >>
> >> Why does it need to be part of a VFS instead of using a file system
> >with
> >> COW like ZFS?
> >>
> >> On Fri, 4 Oct 2019 at 12:18, Fredrik Larsen <[hidden email]> wrote:
> >>
> >> > Hi
> >> >
> >> > A copy-on-write IO-path where data is split into static and dynamic
> >parts
> >> > (think snapshots for storage) would be very helpful for our project,
> >.
> >> This
> >> > would simplify backups, testing, moving data around in a multinode
> >> > environment, etc.
> >> >
> >> > Does something like this exist for sqlite? In my head this sounds
> >like an
> >> > relative easy feature to add as IO-stuff is already centralized in
> >the
> >> VFS
> >> > layer. Maybe a new COW-VFS?
> >> >
> >> > Fredrik
> >> > _______________________________________________
> >> > sqlite-users mailing list
> >> > [hidden email]
> >> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >> >
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >[hidden email]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Simon Slavin-3
On 7 Oct 2019, at 9:56am, Fredrik Larsen <[hidden email]> wrote:

> nowhere does it warn about severe regression with
> unbounded WAL-size [snip]

There are tons of bad stuff the documentation doesn't warn you about.  You might want to read

<https://sqlite.org/wal.html#avoiding_excessively_large_wal_files>

again and get back to us if you have further questions.  There's a reason that there's a whole section on avoiding excessively large WAL files.

At your level of sophistication you can probably guess the fallout from the things you're doing.  The sort of stuff you're doing is rare and obscure.  Almost all users of SQLite pick a journalling mode and let SQLite handle everything itself.

> Given that we disable ceckpointing, can we assume that the main
> database-file will never be modified, and therefor could potentially be mounted read-only?

You are correct that the WAL file contains database pages, rather than SQL statements.  A recent change to the way WAL files work means that if an already-modified page is modified again, the page appears in the WAL file only once.  So the size of the WAL file depends more on how much new data gets written to the database than the total number of changes.

Presumably you mean you disable /automatic/ checkpointing.  If you disable all checkpointing your writes never get completed.  Sooner or later you are going to have to let SQLite modify your database file, so no, you can't mount it read-only if you want to make changes.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Fredrik Larsen
Hi Simon

In my head, checkpointing implies copying back all dirty-pages from the
WAL/COW-log to the main db-file. If we never checkpoint, the writes are
still completed, but lives in the WAL-file. We will offcourse merge back
pages to the main db-file, but this would be an offline process we start
when we want to create a new "base-image" of the db. For this to work, it
is very important that the main db-file is untouched until we actively want
to update this file.

Yes, I'm aware that we are trying to "missuse" sqlite a bit here, but it
may just work :)

Fredrik


On Mon, Oct 7, 2019 at 11:16 AM Simon Slavin <[hidden email]> wrote:

> On 7 Oct 2019, at 9:56am, Fredrik Larsen <[hidden email]> wrote:
>
> > nowhere does it warn about severe regression with
> > unbounded WAL-size [snip]
>
> There are tons of bad stuff the documentation doesn't warn you about.  You
> might want to read
>
> <https://sqlite.org/wal.html#avoiding_excessively_large_wal_files>
>
> again and get back to us if you have further questions.  There's a reason
> that there's a whole section on avoiding excessively large WAL files.
>
> At your level of sophistication you can probably guess the fallout from
> the things you're doing.  The sort of stuff you're doing is rare and
> obscure.  Almost all users of SQLite pick a journalling mode and let SQLite
> handle everything itself.
>
> > Given that we disable ceckpointing, can we assume that the main
> > database-file will never be modified, and therefor could potentially be
> mounted read-only?
>
> You are correct that the WAL file contains database pages, rather than SQL
> statements.  A recent change to the way WAL files work means that if an
> already-modified page is modified again, the page appears in the WAL file
> only once.  So the size of the WAL file depends more on how much new data
> gets written to the database than the total number of changes.
>
> Presumably you mean you disable /automatic/ checkpointing.  If you disable
> all checkpointing your writes never get completed.  Sooner or later you are
> going to have to let SQLite modify your database file, so no, you can't
> mount it read-only if you want to make changes.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Simon Slavin-3
On 7 Oct 2019, at 10:34am, Fredrik Larsen <[hidden email]> wrote:

> In my head, checkpointing implies copying back all dirty-pages from the WAL/COW-log to the main db-file. If we never checkpoint, the writes are still completed, but lives in the WAL-file. We will offcourse merge back pages to the main db-file, but this would be an offline process we start when we want to create a new "base-image" of the db. For this to work, it is very important that the main db-file is untouched until we actively want to update this file.

Your use of SQLite depends on several obscure facts about how SQLite works.  If you ever want to change implementation details you're going to need someone who understands SQLite very thoroughly.

Had you considered just making a text file of all the SQL commands executed, and running that against your main database file ?  It would take less CPU time, less filespace, be easier to debug, and be simpler for another programmer to understand.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Fredrik Larsen
I'm aware that what I'm asking for is not be very portable between
sql-engines, and that is fine. It does not affect core-product/value, just
infrastructure. Also, this is about optimizing infrastructure work, so nice
if it works, no biggi if it does not.

But the thing we want to optimize is the size we need to copy to move/copy
db to different nodes. Today we must copy the full 20Gb+ to accomplish
this, and this will also be required in your text-file solution. One
possible solution is to store a read-only "base-image" of the db in a
network-mounted folder, and only copy over the changes since base-image was
created. This will reduce copy-time to almost zero in our case. Data is
accumulated over a long time so day-to-day changes are small (<1Mb?). Also,
I know that moving database around is not a very common requirement, but
this is something we often do.

Anyway, I will look more into the WAL-system when I get some time. If
someone know for sure what I'm planning to do will crash and burn, I will
appriciate a heads-up on why.

Fredrik

On Mon, Oct 7, 2019 at 11:42 AM Simon Slavin <[hidden email]> wrote:

> On 7 Oct 2019, at 10:34am, Fredrik Larsen <[hidden email]> wrote:
>
> > In my head, checkpointing implies copying back all dirty-pages from the
> WAL/COW-log to the main db-file. If we never checkpoint, the writes are
> still completed, but lives in the WAL-file. We will offcourse merge back
> pages to the main db-file, but this would be an offline process we start
> when we want to create a new "base-image" of the db. For this to work, it
> is very important that the main db-file is untouched until we actively want
> to update this file.
>
> Your use of SQLite depends on several obscure facts about how SQLite
> works.  If you ever want to change implementation details you're going to
> need someone who understands SQLite very thoroughly.
>
> Had you considered just making a text file of all the SQL commands
> executed, and running that against your main database file ?  It would take
> less CPU time, less filespace, be easier to debug, and be simpler for
> another programmer to understand.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Copy-on-write VFS

Keith Medcalf
In reply to this post by Simon Slavin-3

On Monday, 7 October, 2019 03:16, Simon Slavin <[hidden email]> wrote:

>> Given that we disable ceckpointing, can we assume that the main
>> database-file will never be modified, and therefor could potentially be
>> mounted read-only?

No, and No.

>You are correct that the WAL file contains database pages, rather than
>SQL statements.  A recent change to the way WAL files work means that if
>an already-modified page is modified again, the page appears in the WAL
>file only once.  So the size of the WAL file depends more on how much new
>data gets written to the database than the total number of changes.

You are misstating the change.  The correct statement is:

"A recent change to the way WAL files work means that if a page ALREADY MODIFIED IN A TRANSACTION IS MODIFIED AGAIN IN THE SAME TRANSACTION, then that page appears in the WAL file only once FOR THAT TRANSACTION."  That is to say that for each transaction appearing in the WAL file, each page modified by that transaction appears only once.  If a given page is modified by multiple transactions then that page appears in the WAL file once for each transaction.  It must be so or you could not have a "snapshot" position between transactions.  However, if you are coalescing (checkpointing) three transactions from the WAL file to the main database, then indeed only the latest change to a page made by any of those transactions needs to be copied to the main file, even though each of the three transactions may have modified the same page.

>Presumably you mean you disable /automatic/ checkpointing.  If you
>disable all checkpointing your writes never get completed.  Sooner or
>later you are going to have to let SQLite modify your database file, so
>no, you can't mount it read-only if you want to make changes.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users