Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

Donald Griggs
There's an interesting paper at

https://www.usenix.org/system/files/conference/atc15/atc15-paper-lee-wongun.pdf

I don't know enough to evaluate it, but if I'm understanding correctly:
 -- They have modified sqlite so as to work directly with the EXT4
filesystem to prevent redundant journaling (i.e., otherwise, both sqlite
and ext4 will journal all data changes)
 -- They claim sqlite writes account for a significant portion of mobile
device data writes.
 -- They claim huge reductions in data writes in some configurations --
e.g. down to as low as one sixth of unmodified systems.
 -- They call their new sqlite mode "WALDIO" for WAL Direct-I/O.
 -- They make several changes to obtain the claimed efficiency, such as
preallocation and initialization of db sectors, modifiying and aligning
headers, commands to EXT4, etc.
 -- One mode does require that power not be removed abruptly from the eMMC
controller (but they still claim durability even in the face of a kernel
panic)
 -- The flash system should not ignore the DISCARD command nor return the
old data if a read is later attempted.
 -- Their prototype is on a Samsung Galaxy S5
 -- I don't notice where their code is available.  I  guess it's
proprietary (?)


*Abstract *
This work is dedicated to resolve the Journaling of Journal Anomaly in
Android IO stack. We orchestrate SQLite and EXT4 filesystem so that
SQLite’s file-backed journaling activity can dispense with the expensive
filesystem intervention, the journaling, without compromising the file
integrity under unexpected filesystem failure. In storing the logs, we
exploit the direct IO to suppress the filesystem interference.


This work consists of three key ingredients:
   (i) Preallocation with Explicit Journaling,
   (ii) Header Embedding, and
   (iii) Group Synchronization.

Preallocation with Explicit Journaling eliminates the filesystem journaling
properly protecting the file metadata against the unexpected system crash.
We redesign the SQLite B-tree structure with Header Embedding to make it
direct IO compatible and block IO friendly. With Group Synch, we minimize
the synchronization overhead of direct IO and make the SQLite operation
NAND Flash friendly. Combining the three technical ingredients, we develop
a new journal mode in SQLite, the WALDIO. We implement it on the
commercially available smartphone. WALDIO mode achieves 5.1× performance
(insert/sec) against WAL mode which is the fastest journaling mode in
SQLite. It yields 2.7× performance (inserts/sec) against the LS-MVBT, the
fastest SQLite journaling mode known to public. WALDIO mode achieves 7.4×
performance (insert/sec) against WAL mode when it is relieved from the
overhead of explicitly synchronizing individual log-commit operations.
WALDIO mode reduces the IO volume to 1/6 compared against the WAL mode.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

David Woodhouse
On Mon, 2015-07-13 at 21:25 -0400, Donald Griggs wrote:
>
>  -- One mode does require that power not be removed abruptly from the eMMC
> controller (but they still claim durability even in the face of a kernel
> panic)

That's true of *all* modes of operation of most MMC and SSD class
devices. The internal file system ("translation layer" which they use
to pretend to be a disk is usually horrifically unsafe. We have some
real horror stories of hooking up a logic analyser to the flash chips,
and watching what the µcontroller actually does during wear levelling
and garbage collection.

What you actually want is to drive the real flash directly, with none
of this silly "pretend to be spinning rust" nonsense.

--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

Mikael
The ZFS filesystem Fletcher-checksums all data - perhaps together with
tagging each "page" with its "page number", duno!

This is awesome as it's a quite nice data integrity guarantee: this
guarantees that data is in the right place (so broken sector mapping tables
won't break anything) and is correct!

Does SQLite do the same, what about SQLite do the same, patch?


2015-07-14 12:35 GMT+02:00 David Woodhouse <[hidden email]>:

> On Mon, 2015-07-13 at 21:25 -0400, Donald Griggs wrote:
> >
> >  -- One mode does require that power not be removed abruptly from the
> eMMC
> > controller (but they still claim durability even in the face of a kernel
> > panic)
>
> That's true of *all* modes of operation of most MMC and SSD class
> devices. The internal file system ("translation layer" which they use
> to pretend to be a disk is usually horrifically unsafe. We have some
> real horror stories of hooking up a logic analyser to the flash chips,
> and watching what the µcontroller actually does during wear levelling
> and garbage collection.
>
> What you actually want is to drive the real flash directly, with none
> of this silly "pretend to be spinning rust" nonsense.
>
> --
> dwmw2
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

Simon Slavin-3

On 14 Jul 2015, at 2:53pm, Mikael <[hidden email]> wrote:

> This is awesome as it's a quite nice data integrity guarantee: this
> guarantees that data is in the right place (so broken sector mapping tables
> won't break anything) and is correct!
>
> Does SQLite do the same, what about SQLite do the same, patch?

For details on the SQLite file format see

<https://www.sqlite.org/fileformat.html>
<https://www.sqlite.org/fileformat2.html>

As you can see, journal pages carry checksums, database pages don't.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

Roger Binns
In reply to this post by Mikael
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/14/2015 06:53 AM, Mikael wrote:
> Does SQLite do the same, what about SQLite do the same, patch?

The SQLite authors rejected checksumming SQLite database pages.  The
existing integrity check will only catch issues that happen in
sufficiently important metadata, but in general won't catch corruption.

  http://www.sqlite.org/src/tktview?name=72b01a982a

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlWlKfgACgkQmOOfHg372QTW0gCgn5PVs7z9G6FEu5dG31hbRNy1
jAIAniXv0ebDjsCuroOrkwI7D4Wszwno
=sV74
-----END PGP SIGNATURE-----
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

Mikael
Wait, does https://www.sqlite.org/fileformat2.html say that database pages
have an index number stored (so if storage messes up sequence of sectors,
will SQLite notice)?


Right, exactly http://www.sqlite.org/src/tktview?name=72b01a982a .

And that means the underlying FS needs to make the checksumming, for there
to be any "more real" integrity guarantees.

What relevance do you guys see in introducing this?



(* A underlying FS can make additional integrity by using checksums *and*
mirrors, however, SQLite introducing a checksum-based integrity guarantee
model is like Very valuable so really having both is only good :) )


2015-07-14 15:59 GMT+02:00 Simon Slavin <[hidden email]>:

>
> On 14 Jul 2015, at 2:53pm, Mikael <[hidden email]> wrote:
>
> > This is awesome as it's a quite nice data integrity guarantee: this
> > guarantees that data is in the right place (so broken sector mapping
> tables
> > won't break anything) and is correct!
> >
> > Does SQLite do the same, what about SQLite do the same, patch?
>
> For details on the SQLite file format see
>
> <https://www.sqlite.org/fileformat.html>
> <https://www.sqlite.org/fileformat2.html>
>
> As you can see, journal pages carry checksums, database pages don't.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>

2015-07-14 17:25 GMT+02:00 Roger Binns <[hidden email]>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/14/2015 06:53 AM, Mikael wrote:
> > Does SQLite do the same, what about SQLite do the same, patch?
>
> The SQLite authors rejected checksumming SQLite database pages.  The
> existing integrity check will only catch issues that happen in
> sufficiently important metadata, but in general won't catch corruption.
>
>   http://www.sqlite.org/src/tktview?name=72b01a982a
>
> Roger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iEYEARECAAYFAlWlKfgACgkQmOOfHg372QTW0gCgn5PVs7z9G6FEu5dG31hbRNy1
> jAIAniXv0ebDjsCuroOrkwI7D4Wszwno
> =sV74
> -----END PGP SIGNATURE-----
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

Richard Hipp-3
In reply to this post by Donald Griggs
On 7/13/15, Donald Griggs <[hidden email]> wrote:
> There's an interesting paper at
>
> https://www.usenix.org/system/files/conference/atc15/atc15-paper-lee-wongun.pdf

Yes, a very interesting paper.  Thanks for bringing it to my attention.

>
> I don't know enough to evaluate it, but if I'm understanding correctly:
>  -- They have modified sqlite so as to work directly with the EXT4
> filesystem to prevent redundant journaling (i.e., otherwise, both sqlite
> and ext4 will journal all data changes)

I'd rephrase this to say that they investigated and prototyped
potential changes to SQLite that might help it play better with EXT4.
I don't think they actually generated a version of SQLite that "works"
for more than their limited test set.

Nevertheless, there are some useful insights.  We will run some
experiments based on their ideas and perhaps we can improve the write
performance on EXT4 for some future release - 3.8.12 or later.

>  -- They claim sqlite writes account for a significant portion of mobile
> device data writes.
>  -- They claim huge reductions in data writes in some configurations --
> e.g. down to as low as one sixth of unmodified systems.
>  -- They call their new sqlite mode "WALDIO" for WAL Direct-I/O.
>  -- They make several changes to obtain the claimed efficiency, such as
> preallocation and initialization of db sectors, modifiying and aligning
> headers, commands to EXT4, etc.
>  -- One mode does require that power not be removed abruptly from the eMMC
> controller (but they still claim durability even in the face of a kernel
> panic)
>  -- The flash system should not ignore the DISCARD command nor return the
> old data if a read is later attempted.
>  -- Their prototype is on a Samsung Galaxy S5
>  -- I don't notice where their code is available.  I  guess it's
> proprietary (?)
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Usenix paper: Korean researchers invent sqlite WALDIO mode to circumvent redundant journaling by EXT4 on eMMC

wongun lee
This post has NOT been accepted by the mailing list yet.
Nice meet you.
Thank you for the interest in my paper.

My name is won-gun,lee and the main author of this WALDIO paper. Please tell me, If you have any questions about WALDIO concept.

And I want to contribute my ideas to improve SQLite's performance. According our test, WALDIO journaling mode gets big improvement on XFS filesystem, also.

Thank you,
WonGun L.