dqlite - SQLite replication and failover library

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

dqlite - SQLite replication and failover library

Free Ekanayaka
Hi,

first of all let me thank Richard Hipp and the rest of the SQLite team
and community for such a great piece of software. I've been recently
working with SQLite's code base and found it an absolute pleasure to
read (and have learned from it too).

In this mail I'd like to:

1) Present dqlite, a library replicating your application's SQLite
   database across N nodes and safely surviving any minority of them
   dying or disconnecting (only for Go applications for now, see below).

2) Submit a patch to SQLite that introduces a minimal replication API,
   and get feedback about its possible inclusion upstream.

= dqlite =

It's a Go package that uses the Raft algorithm to replicate SQLite WAL
frames across a cluster of nodes. This roughly means that you can open a
SQLite connection using the "database/sql" standard lib API and have
anything you transactionally replicated. No external process needed.

Ideally this library should have been written in C or Rust, to support
binding to any language. However, due to the use case and timeline of
the first project that will use it (LXD [0]), and due to the lack of
mature Raft implementations in C/Rust, Go was chosen instead. It should
hopefully at least serve as reference to anyone needing a C/Rust
version.

The work has been funded by Canonical, the company behind Ubuntu. Please
see the dqlite's home page [1] for more details.

= SQLite replication API patch =

This is the SQLite patch that dqlite depends on. It essentially adds a
few key hooks in the pager and write-ahead log to let external libraries
like dqlite implement WAL-based database replication.

As you'll quickly see, it's by no means ready for upstream inclusion, in
particular it lacks unit tests and more comprehensive documentation
comments (note however that virtually every code path introduced by the
patch is already exercised indirectly by dqlite's own unit tests).

If the SQLite team thinks there is room for upstream inclusion, I'll be
more than glad to do the necessary work to make the patch adhere to
SQLite's standards and go through a review process.

The patch has currently 703 additions and 22 deletions, and is published
on GitHub [2].

Cheers,

Free

[0] https://linuxcontainers.org/
[1] https://github.com/CanonicalLtd/dqlite
[2] https://github.com/CanonicalLtd/sqlite/commit/2a9aa8b056f37ae05f38835182a2856ffc95aee4
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: dqlite - SQLite replication and failover library

wmertens
Very interesting!

So how does it behave during conflict situations? Raft selects a winning
WAL write and any others in flight are aborted?

And when not enough nodes are available, writes are hung until consensus?

I won't be able to use it due to Go but it's great to know that this is on
the horizon of possibilities… Very nice!

On Sat, Aug 19, 2017 at 10:47 AM Free Ekanayaka <[hidden email]> wrote:

> Hi,
>
> first of all let me thank Richard Hipp and the rest of the SQLite team
> and community for such a great piece of software. I've been recently
> working with SQLite's code base and found it an absolute pleasure to
> read (and have learned from it too).
>
> In this mail I'd like to:
>
> 1) Present dqlite, a library replicating your application's SQLite
>    database across N nodes and safely surviving any minority of them
>    dying or disconnecting (only for Go applications for now, see below).
>
> 2) Submit a patch to SQLite that introduces a minimal replication API,
>    and get feedback about its possible inclusion upstream.
>
> = dqlite =
>
> It's a Go package that uses the Raft algorithm to replicate SQLite WAL
> frames across a cluster of nodes. This roughly means that you can open a
> SQLite connection using the "database/sql" standard lib API and have
> anything you transactionally replicated. No external process needed.
>
> Ideally this library should have been written in C or Rust, to support
> binding to any language. However, due to the use case and timeline of
> the first project that will use it (LXD [0]), and due to the lack of
> mature Raft implementations in C/Rust, Go was chosen instead. It should
> hopefully at least serve as reference to anyone needing a C/Rust
> version.
>
> The work has been funded by Canonical, the company behind Ubuntu. Please
> see the dqlite's home page [1] for more details.
>
> = SQLite replication API patch =
>
> This is the SQLite patch that dqlite depends on. It essentially adds a
> few key hooks in the pager and write-ahead log to let external libraries
> like dqlite implement WAL-based database replication.
>
> As you'll quickly see, it's by no means ready for upstream inclusion, in
> particular it lacks unit tests and more comprehensive documentation
> comments (note however that virtually every code path introduced by the
> patch is already exercised indirectly by dqlite's own unit tests).
>
> If the SQLite team thinks there is room for upstream inclusion, I'll be
> more than glad to do the necessary work to make the patch adhere to
> SQLite's standards and go through a review process.
>
> The patch has currently 703 additions and 22 deletions, and is published
> on GitHub [2].
>
> Cheers,
>
> Free
>
> [0] https://linuxcontainers.org/
> [1] https://github.com/CanonicalLtd/dqlite
> [2]
> https://github.com/CanonicalLtd/sqlite/commit/2a9aa8b056f37ae05f38835182a2856ffc95aee4
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: dqlite - SQLite replication and failover library

Free Ekanayaka
Wout Mertens <[hidden email]> writes:

> Very interesting!
>
> So how does it behave during conflict situations? Raft selects a winning
> WAL write and any others in flight are aborted?

Ah yeah this is probably something that was not clear from the docs or
from my presentation.

There can't be a conflict situation. Raft's model is that only the
leader can append new log entries, which translated to dqlite means that
only the leader can write new WAL frames. So this means that any attempt
to perform a write transaction on a non-leader node will fail with a
SQLITE_NOT_LEADER error (and in this case clients are supposed to retry
against whoever is the new leader).

I'm going to add this to the FAQ.

> And when not enough nodes are available, writes are hung until
> consensus?

Yes, but there's a (configurable timeout). It's not possible to *not*
have timeout (although you can set it really really high of course :)

> I won't be able to use it due to Go but it's great to know that this is on
> the horizon of possibilities… Very nice!

Yeah I think Go is somehow limiting, but hopefully once Raft libraries
mature in C/Raft, dqlite can act as reference/prototype.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: dqlite - SQLite replication and failover library

Free Ekanayaka
Free Ekanayaka <[hidden email]> writes:

>> And when not enough nodes are available, writes are hung until
>> consensus?
>
> Yes, but there's a (configurable timeout).

BTW, this is a consequence of Raft sitting in the CP spectrum of the CAP
theorem: in case of a network partition it chooses consistency and
sacrifices availability.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: dqlite - SQLite replication and failover library

wmertens
In reply to this post by Free Ekanayaka
Oh I see, of course. So I assume the client library automatically sends
write commands to the current leader?

I wonder if there is value in setting a preferred leader, but probably
that's messing too much with the Raft protocol.

On Sun, Aug 20, 2017, 11:44 PM Free Ekanayaka <[hidden email]> wrote:

> Wout Mertens <[hidden email]> writes:
>
> > Very interesting!
> >
> > So how does it behave during conflict situations? Raft selects a winning
> > WAL write and any others in flight are aborted?
>
> Ah yeah this is probably something that was not clear from the docs or
> from my presentation.
>
> There can't be a conflict situation. Raft's model is that only the
> leader can append new log entries, which translated to dqlite means that
> only the leader can write new WAL frames. So this means that any attempt
> to perform a write transaction on a non-leader node will fail with a
> SQLITE_NOT_LEADER error (and in this case clients are supposed to retry
> against whoever is the new leader).
>
> I'm going to add this to the FAQ.
>
> > And when not enough nodes are available, writes are hung until
> > consensus?
>
> Yes, but there's a (configurable timeout). It's not possible to *not*
> have timeout (although you can set it really really high of course :)
>
> > I won't be able to use it due to Go but it's great to know that this is
> on
> > the horizon of possibilities… Very nice!
>
> Yeah I think Go is somehow limiting, but hopefully once Raft libraries
> mature in C/Raft, dqlite can act as reference/prototype.
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: dqlite - SQLite replication and failover library

Free Ekanayaka
Wout Mertens <[hidden email]> writes:

> Oh I see, of course. So I assume the client library automatically sends
> write commands to the current leader?

No, that's up the application for now, the library just returns you an
error if you attempt a write on a non-leader node.

> I wonder if there is value in setting a preferred leader, but probably
> that's messing too much with the Raft protocol.

I'm not entirely sure to understand, but if you mean "if possible, I
generally would like the leader to be this node, please", no that's
currently not supported. I don't see a reason why it couldn't be added,
but it seems a kind of exotic requirement in today's "cats vs pets" way
of thinking to nodes.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users