SQLite 3.7.17 preview - 2x faster?

classic Classic list List threaded Threaded
60 messages Options
123
Reply | Threaded
Open this post in threaded view
|

SQLite 3.7.17 preview - 2x faster?

Richard Hipp-3
By making use of memory-mapped I/O, the current trunk of SQLite (which will
eventually become version 3.7.17 after much more refinement and testing)
can be as much as twice as fast, on some platforms and under some
workloads.  We would like to encourage people to try out the new code and
report both success and failure.  Snapshots of the amalgamation can be
found at

   http://www.sqlite.org/draft/download.html

Links to the relevant documentation can bee seen at

   http://www.sqlite.org/draft/releaselog/3_7_17.html

The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
solaris.  We have found that it does not work on OpenBSD, for reasons we
have not yet been able to uncove; but as a precaution, memory mapped I/O is
disabled by default on all of the *BSDs until we understand the problem.
The biggest performance gains occur on windows, mac, and solaris.  The new
code is also faster on linux, but not by as big a factor.  The speed
improvement is also heavily dependent upon workload.  Some operations can
be almost twice as faster.  For others, there is no measurable speed
improvement.

Your feedback on whether or not the new code is faster for you, and whether
or not it even works for you, is very important to us.  Thanks for giving
the new code a try.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Drake Wilson-4
Quoth Richard Hipp <[hidden email]>, on 2013-04-04 08:02:34 -0400:
> By making use of memory-mapped I/O, the current trunk of SQLite (which will
> eventually become version 3.7.17 after much more refinement and testing)
> can be as much as twice as fast, on some platforms and under some
> workloads.
[...]

I'm curious how you plan to handle reliability against I/O errors in
mmap mode.  My understanding is that achieving this in a library
without potentially interfering with the host program operation is
extremely difficult on Linux, and is reliable but requires significant
platform-specific juggling on Windows; I don't know as much about
other OSes.

Specifically, an I/O error faulting in an mmapped page can deliver a
SIGBUS to the thread.  If unhandled, this will crash the entire host
application, and setting local signal handlers for just that case is
hard-to-impossible to do reliably from libraries without a lot of
coöperation from both the host application and any other library that
needs the same thing.

A possible way to partially test this (which I haven't tried against
this SQLite yet) is to stop the reading process right before it reads
a page that it has not yet touched, truncate the file to a length less
than the page offset, then resume the original process.

It now occurs to me (which it did not before) that WAL mode also has
this danger to some extent with the -shm files, but this is mitigated
mainly because (a) WAL mode must be turned on explicitly for a given
database file and secondarily because (b) AIUI, the -shm files are
only kept while any processes have the database open, and are small
enough that they are very likely to stay in memory the entire time.
(Even so, it may be worthwhile to mlock the regions before using them,
which a quick grep does not find currently, but that is an open
question, not a hard recommendation.)

   ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Ryan Johnson-10
In reply to this post by Richard Hipp-3
On 04/04/2013 8:02 AM, Richard Hipp wrote:
> By making use of memory-mapped I/O, the current trunk of SQLite (which will
> eventually become version 3.7.17 after much more refinement and testing)
> can be as much as twice as fast, on some platforms and under some
> workloads.
Nice!

Some quick thoughts:

1. Does this replace the page cache completely, or does it just turn
"read" and "write" into glorified memcpy calls? I would assume the
latter so that virtual tables continue to work?

2. Does sqlite3 attempt to map the entire database file, and what
happens with large files in 32-bit processes?

3. It seems like this would increase the "attack surface" for stray
pointers in the host program. Granted, writes to stray pointers are not
sqlite's fault, but they're an unfortunately common problem... and mmap
makes user bugs more likely to directly corrupt the database on disk.
Perceived reliability might drop as a result (I'm not arguing that the
risk is worth giving up 2x, just pointing it out as a potential
unintended consequence).

Thoughts?
Ryan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Howard Chu
Ryan Johnson wrote:
> 3. It seems like this would increase the "attack surface" for stray
> pointers in the host program. Granted, writes to stray pointers are not
> sqlite's fault, but they're an unfortunately common problem... and mmap
> makes user bugs more likely to directly corrupt the database on disk.
> Perceived reliability might drop as a result (I'm not arguing that the
> risk is worth giving up 2x, just pointing it out as a potential
> unintended consequence).

This is why OpenLDAP LMDB uses a read-only mmap by default. User bugs get an
immediate SEGV, and usually the bug becomes obvious and easy to fix.

--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Teg-3
In reply to this post by Richard Hipp-3
Hello Richard,

How much do you map at a time? I've virtually abandoned memory mapped
files in Win32 because of address space limitations. There's a 2 GB
address space limit in Win32 (most of the time) so, if the
combination of allocated RAM and memory mapped file size bump into the
limit,  the memory map will fail. Win64 doesn't have this limit. It'll
fail if it can't get a contiguous block of address space too.

C

Thursday, April 4, 2013, 8:02:34 AM, you wrote:

RH> By making use of memory-mapped I/O, the current trunk of SQLite (which will
RH> eventually become version 3.7.17 after much more refinement and testing)
RH> can be as much as twice as fast, on some platforms and under some
RH> workloads.  We would like to encourage people to try out the new code and
RH> report both success and failure.  Snapshots of the amalgamation can be
RH> found at

RH>    http://www.sqlite.org/draft/download.html

RH> Links to the relevant documentation can bee seen at

RH>    http://www.sqlite.org/draft/releaselog/3_7_17.html

RH> The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
RH> solaris.  We have found that it does not work on OpenBSD, for reasons we
RH> have not yet been able to uncove; but as a precaution, memory mapped I/O is
RH> disabled by default on all of the *BSDs until we understand the problem.
RH> The biggest performance gains occur on windows, mac, and solaris. The new
RH> code is also faster on linux, but not by as big a factor.  The speed
RH> improvement is also heavily dependent upon workload.  Some operations can
RH> be almost twice as faster.  For others, there is no measurable speed
RH> improvement.

RH> Your feedback on whether or not the new code is faster for you, and whether
RH> or not it even works for you, is very important to us.  Thanks for giving
RH> the new code a try.




--
Best regards,
 Teg                            mailto:[hidden email]

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Howard Chu
In reply to this post by Richard Hipp-3
Richard Hipp wrote:
> The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
> solaris.  We have found that it does not work on OpenBSD, for reasons we
> have not yet been able to uncove; but as a precaution, memory mapped I/O is
> disabled by default on all of the *BSDs until we understand the problem.

As I understand it, OpenBSD lacks a unified buffer cache. They reported
problems with LMDB in its default mode, too. But FreeBSD should be OK. I don't
know about any of the other BSD variants.

> The biggest performance gains occur on windows, mac, and solaris.  The new
> code is also faster on linux, but not by as big a factor.  The speed
> improvement is also heavily dependent upon workload.  Some operations can
> be almost twice as faster.  For others, there is no measurable speed
> improvement.
>
> Your feedback on whether or not the new code is faster for you, and whether
> or not it even works for you, is very important to us.  Thanks for giving
> the new code a try.
>


--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Dan Kennedy-4
On 04/04/2013 08:44 PM, Howard Chu wrote:
> Richard Hipp wrote:
>> The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
>> solaris.  We have found that it does not work on OpenBSD, for reasons we
>> have not yet been able to uncove; but as a precaution, memory mapped
>> I/O is
>> disabled by default on all of the *BSDs until we understand the problem.
>
> As I understand it, OpenBSD lacks a unified buffer cache. They reported
> problems with LMDB in its default mode, too.

But it works in some non-default mode? When both reads and writes are
done via memory mapping? Or some other trick?

Dan.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Howard Chu
Dan Kennedy wrote:

> On 04/04/2013 08:44 PM, Howard Chu wrote:
>> Richard Hipp wrote:
>>> The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
>>> solaris.  We have found that it does not work on OpenBSD, for reasons we
>>> have not yet been able to uncove; but as a precaution, memory mapped
>>> I/O is
>>> disabled by default on all of the *BSDs until we understand the problem.
>>
>> As I understand it, OpenBSD lacks a unified buffer cache. They reported
>> problems with LMDB in its default mode, too.
>
> But it works in some non-default mode? When both reads and writes are
> done via memory mapping? Or some other trick?

Right. It works if you use a writable mmap and do all reads and writes thru
the map. But any process that comes along and accesses the file using read
will see invalid/stale information, and start double-caching the file pages.

--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Richard Hipp-3
In reply to this post by Drake Wilson-4
On Thu, Apr 4, 2013 at 8:43 AM, Drake Wilson <[hidden email]> wrote:

> Quoth Richard Hipp <[hidden email]>, on 2013-04-04 08:02:34 -0400:
> > By making use of memory-mapped I/O, the current trunk of SQLite (which
> will
> > eventually become version 3.7.17 after much more refinement and testing)
> > can be as much as twice as fast, on some platforms and under some
> > workloads.
> [...]
>
> I'm curious how you plan to handle reliability against I/O errors in
> mmap mode.
>
> Specifically, an I/O error faulting in an mmapped page can deliver a
> SIGBUS to the thread.
>

Is this really a problem?  Your executable and all of your shared libraries
are also mmapped into your address space.  If accessing mmapped memory were
causing bus errors, then we'd be seeing bus errors all over the place.



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Richard Hipp-3
In reply to this post by Ryan Johnson-10
On Thu, Apr 4, 2013 at 9:02 AM, Ryan Johnson <[hidden email]>wrote:

> On 04/04/2013 8:02 AM, Richard Hipp wrote:
>
>> By making use of memory-mapped I/O, the current trunk of SQLite (which
>> will
>> eventually become version 3.7.17 after much more refinement and testing)
>> can be as much as twice as fast, on some platforms and under some
>> workloads.
>>
> Nice!
>
> Some quick thoughts:
>
> 1. Does this replace the page cache completely, or does it just turn
> "read" and "write" into glorified memcpy calls? I would assume the latter
> so that virtual tables continue to work?
>

No.  The page cache is still there.


>
> 2. Does sqlite3 attempt to map the entire database file, and what happens
> with large files in 32-bit processes?
>

It mmaps the first N bytes of the database file where N is configurable.
The default N at the moment is 256MiB.  You can change it to 0 or to as big
of a number as you want using a PRAGMA.


>
> 3. It seems like this would increase the "attack surface" for stray
> pointers in the host program. Granted, writes to stray pointers are not
> sqlite's fault, but they're an unfortunately common problem... and mmap
> makes user bugs more likely to directly corrupt the database on disk.
> Perceived reliability might drop as a result (I'm not arguing that the risk
> is worth giving up 2x, just pointing it out as a potential unintended
> consequence).
>
> Thoughts?
> Ryan
>
> ______________________________**_________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Richard Hipp-3
In reply to this post by Teg-3
On Thu, Apr 4, 2013 at 9:22 AM, Teg <[hidden email]> wrote:

> Hello Richard,
>
> How much do you map at a time?


The default on windows is currently 256MiB.  You can adjust this number up
or down using a pragma.  Or you can change it at compile-time or start-time.



> I've virtually abandoned memory mapped
> files in Win32 because of address space limitations. There's a 2 GB
> address space limit in Win32 (most of the time) so, if the
> combination of allocated RAM and memory mapped file size bump into the
> limit,  the memory map will fail. Win64 doesn't have this limit. It'll
> fail if it can't get a contiguous block of address space too.
>
> C
>
> Thursday, April 4, 2013, 8:02:34 AM, you wrote:
>
> RH> By making use of memory-mapped I/O, the current trunk of SQLite (which
> will
> RH> eventually become version 3.7.17 after much more refinement and
> testing)
> RH> can be as much as twice as fast, on some platforms and under some
> RH> workloads.  We would like to encourage people to try out the new code
> and
> RH> report both success and failure.  Snapshots of the amalgamation can be
> RH> found at
>
> RH>    http://www.sqlite.org/draft/download.html
>
> RH> Links to the relevant documentation can bee seen at
>
> RH>    http://www.sqlite.org/draft/releaselog/3_7_17.html
>
> RH> The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
> RH> solaris.  We have found that it does not work on OpenBSD, for reasons
> we
> RH> have not yet been able to uncove; but as a precaution, memory mapped
> I/O is
> RH> disabled by default on all of the *BSDs until we understand the
> problem.
> RH> The biggest performance gains occur on windows, mac, and solaris. The
> new
> RH> code is also faster on linux, but not by as big a factor.  The speed
> RH> improvement is also heavily dependent upon workload.  Some operations
> can
> RH> be almost twice as faster.  For others, there is no measurable speed
> RH> improvement.
>
> RH> Your feedback on whether or not the new code is faster for you, and
> whether
> RH> or not it even works for you, is very important to us.  Thanks for
> giving
> RH> the new code a try.
>
>
>
>
> --
> Best regards,
>  Teg                            mailto:[hidden email]
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Drake Wilson-4
In reply to this post by Richard Hipp-3
Quoth Richard Hipp <[hidden email]>, on 2013-04-04 10:51:22 -0400:
> Is this really a problem?  Your executable and all of your shared libraries
> are also mmapped into your address space.  If accessing mmapped memory were
> causing bus errors, then we'd be seeing bus errors all over the place.

As I interpret it, this is because it's commonly assumed that if part
of your executable code goes away, you cannot reliably continue (there
is no way to know what to do now), so crashing the whole process is
acceptable.  A system integrator or administrator must choose the
devices that will contain native code accordingly, since they can
bound the reliability of almost the entire system.  A similar argument
applies for choosing swap devices that may back any anonymous memory;
if a swap device fails, it is expected that a lot of things may crash.

So it is perfectly okay to use unprotected mmap accesses if an I/O
error on the file will already make the entire process uncontinuable.
The question is whether this applies to arbitrary SQLite databases
that an application may open, and I suspect that (a) it probably
doesn't, and (b) this reliability transitivity behavior would be a
significant departure from earlier SQLite versions.

As a hypothetical, more concrete example, consider a cluster of DNS
servers backed by mostly-read-only SQLite databases.  The system
integrator chooses highly reliable local ROM devices to store OS and
application code, but due to size and update flexibility requirements,
the database files are spread out and accessed via network filesystem.
With unprotected mmap, if any storage backend goes down or suffers a
media error, the entire DNS server process may crash upon trying to
read it, as opposed to receiving an error code and returning temporary
SERVFAIL responses for the affected data sets until the error can be
repaired.  (Arguably someone running such a service should plan for
this in other ways too, but I think SQLite should not exacerbate the
effects of such failures any more than necessary.)

This can be avoided by explicitly turning mmap off, but due to this I
would think that off should be the default, much like how WAL is not
the default journal mode (despite its considerable benefits in many
use cases) because it creates additional requirements that must be
taken into account.

Of course I may be missing something important here.

   ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Nico Williams
In reply to this post by Howard Chu
On Thu, Apr 4, 2013 at 8:19 AM, Howard Chu <[hidden email]> wrote:
> This is why OpenLDAP LMDB uses a read-only mmap by default. User bugs get an
> immediate SEGV, and usually the bug becomes obvious and easy to fix.

There are many reasons to want to use read-only mmap()s (with
MAP_SHARED though) and write(2)/pwrite(2) for writing.  Accidental
write prevention is only one of them.  Another has to do with managing
of write visibility and performance of msync(MS_SYNC):

 - msync(MS_SYNC) is depressingly often implemented as a sequence of
synchronous writes of each page in the given memory range(!), which
completely destroys write performance.

   Whereas write(2)/pwrite(2) are completely asynchronous and fsync(2)
does a single synchronous operation (well, it's not that simple, but
fsync(2) is generally much faster than msync(MS_SYNC).

   Of course, one can still write via an mmap, call msync(MS_ASYNC),
then fsync(2) and get the same effect as writing via write(2) and then
fsync(2).

 - msync(MS_ASYNC) is a no-op on unified buffer cache OSes.

msync(MS_ASYNC) should be used prior to reading new transaction data,
even though in general it will be a no-op.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Drake Wilson-4
In reply to this post by Drake Wilson-4
Quoth Drake Wilson <[hidden email]>, on 2013-04-04 10:20:44 -0500:
> So it is perfectly okay to use unprotected mmap accesses if an I/O
> error on the file will already make the entire process uncontinuable.
> The question is whether this applies to arbitrary SQLite databases
> that an application may open, and I suspect that (a) it probably
> doesn't, and (b) this reliability transitivity behavior would be a
> significant departure from earlier SQLite versions.

Here is a much more direct and concrete example.  Referenced files may
be retrieved from:

  http://dasyatidae.net/files/2013/sqlite3-201304040051/

Here are the steps I used.  This is on a modern Debian GNU/Linux AMD64
system.

  - Compile kvserv.c along with an _earlier_ (probably system) version
    of SQLite than the snapshot amalgamation mentioned above---I used:

      gcc -std=c99 -o kvserv kvserv.c -lsqlite3 -lpthread -ldl

  - Mount a removable disk that you don't care about very much (I used
    a spare USB flash disk), and copy keyval1.db to it.  Unmount,
    unplug, replug, and remount the disk read-only.  The database is
    deliberately a few megabytes in size to reduce the chance that all
    of it will be read ahead into cache; I used:

      echo 1 | sudo tee /proc/sys/vm/drop_caches

    a bit ad-hoc to help ensure this, though it should not theoretically
    be necessary.

  - Symlink the copied file to keyval.db in the current directory (all
    the other files should be on a reliable local disk), and ensure
    UDP port 11105 is not in use.  Run kvserv.  In a separate
    terminal, run something akin to:

      socat - udp6-datagram:[::1]:11105

    (In retrospect I should have used a Unix-domain socket, but I do
    not have time to change it right now; I apologize for the
    inconvenience.)

  - Issue queries to the simple key-value server by entering keys, one
    per line, in the socat terminal.  In particular, the keys 'a',
    'b', and 'c' are defined in the given DB, along with all
    five-digit decimal numbers.  Responses should be returned
    beginning with "OK" followed by either result data or nothing.

  - Unplug the removable disk hard, simulating a media failure.  Issue
    additional queries.  Responses should be returned beginning with
    "NG", indicating that there was an error retrieving the requested
    data.

Repeating these steps, but compiling the application with the
sqlite3.c from the 201304040051 snapshot amalgamation that uses
unprotected mmap, causes the entire kvserv process to die with SIGBUS
as soon as a query tries to access the volume while it is unplugged.

Unless the design of kvserv.c is relevantly unreasonable, this should
help demonstrate the danger of switching SQLite to use unprotected
mmap by default.

   ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
jic
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

jic
In reply to this post by Richard Hipp-3

"Richard Hipp" wrote...


> By making use of memory-mapped I/O, the current trunk of SQLite (which
> will
> eventually become version 3.7.17 after much more refinement and testing)
> can be as much as twice as fast, on some platforms and under some
> workloads.  We would like to encourage people to try out the new code and
> report both success and failure.  Snapshots of the amalgamation can be
> found at
>
>   http://www.sqlite.org/draft/download.html
>
> Links to the relevant documentation can bee seen at
>
>   http://www.sqlite.org/draft/releaselog/3_7_17.html
>
> The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
> solaris.  We have found that it does not work on OpenBSD, for reasons we
> have not yet been able to uncove; but as a precaution, memory mapped I/O
> is
> disabled by default on all of the *BSDs until we understand the problem.
> The biggest performance gains occur on windows, mac, and solaris.  The new
> code is also faster on linux, but not by as big a factor.  The speed
> improvement is also heavily dependent upon workload.  Some operations can
> be almost twice as faster.  For others, there is no measurable speed
> improvement.
>
> Your feedback on whether or not the new code is faster for you, and
> whether
> or not it even works for you, is very important to us.  Thanks for giving
> the new code a try.


Are there any test Windows binaries for this test?  I would love to give
this a try.  I can use the 2X faster processing/response.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Nico Williams
In reply to this post by Drake Wilson-4
On Thu, Apr 4, 2013 at 11:44 AM, Drake Wilson <[hidden email]> wrote:
> Repeating these steps, but compiling the application with the
> sqlite3.c from the 201304040051 snapshot amalgamation that uses
> unprotected mmap, causes the entire kvserv process to die with SIGBUS
> as soon as a query tries to access the volume while it is unplugged.

This is very sad.  But really, the OS should cause kvserv to hang
waiting for I/O from the device to complete (and you should get some
indication, in dmesg, on the console, in a dialog -something- that
there's a missing device that's needed).  Sending SIGBUS because a
device is missing is a bit heavy-handed of the kernel!

In a situation where the filesystem is corrupted it's a bit more
natural to expect a panic/oops/BSOD, or even just user-land equivalent
(like SIGBUS).

(Anyone who remembers what server rooms were like in the mid-90s will
remember SCSI cables falling off and so on.  That SunOS and Solaris
would hang in such events was rather useful.)

> Unless the design of kvserv.c is relevantly unreasonable, this should
> help demonstrate the danger of switching SQLite to use unprotected
> mmap by default.

I doubt kvserv.c is doing anything wrong.  I've not run your test
though.  And searches for linux removable media SIGBUS turn up very
little.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Drake Wilson-4
Quoth Nico Williams <[hidden email]>, on 2013-04-04 16:08:24 -0500:
> This is very sad.  But really, the OS should cause kvserv to hang
> waiting for I/O from the device to complete (and you should get some
> indication, in dmesg, on the console, in a dialog -something- that
> there's a missing device that's needed).  Sending SIGBUS because a
> device is missing is a bit heavy-handed of the kernel!

Well, the device is _gone_ from the perspective of the OS; the kernel
has no way of knowing whether I intend to plug that USB device back
in.  The "removable media" aspect is a bit of a red herring; I am just
using that as a convenient way of inducing a mostly-repeatable read
failure at the hardware level.  A more permanent case would be a bad
sector on a magnetic disk.  It would not make any sense for the kernel
to pause the application indefinitely in case the sector can be
magically restored in the future.

In the case of read() or similar, you are already in a system call and
the kernel can return an error code which the application must already
know how to handle.  In the case of mmap, what is interrupted is a
processor-level memory access, and there is no provision for returning
an error code; all that can be done is to reroute the entire control
flow, and on Unixy systems that is done using signals.

Now, user code that can assume it controls the entire process _does_
have the ability to establish a signal handler to fix up the access.
E.g., one can map a zero page over the broken page, set a flag
somewhere else saying "that data is corrupted", and then somewhere
outside the inner processing loop, check the flag and abort the
operation.  But the sigaction interface is not flexible enough to make
it safe to do this from library code in general, because signal
handlers are process-wide.  E.g., consider two libraries which both
want safe access to memory-mapped files and are being invoked in
different threads...

AIUI, Windows's use of SEH is slightly better in this regard, since
the relevant exception handler can be established using only local
state.  This still requires a compiler capable of emitting SEH frame
establish/teardown code on Windows x86-32 (which had a patent fiasco a
while back which may still be ongoing), and I think maybe appropriate
unwind tables and framing on Windows x86-64, and it doesn't help the
case of Unixy systems at all.

   ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Nico Williams
On Thu, Apr 4, 2013 at 4:45 PM, Drake Wilson <[hidden email]> wrote:

> Quoth Nico Williams <[hidden email]>, on 2013-04-04 16:08:24 -0500:
>> This is very sad.  But really, the OS should cause kvserv to hang
>> waiting for I/O from the device to complete (and you should get some
>> indication, in dmesg, on the console, in a dialog -something- that
>> there's a missing device that's needed).  Sending SIGBUS because a
>> device is missing is a bit heavy-handed of the kernel!
>
> Well, the device is _gone_ from the perspective of the OS; the kernel
> has no way of knowing whether I intend to plug that USB device back
> in.  The "removable media" aspect is a bit of a red herring; I am just
> using that as a convenient way of inducing a mostly-repeatable read
> failure at the hardware level.  A more permanent case would be a bad
> sector on a magnetic disk.  It would not make any sense for the kernel
> to pause the application indefinitely in case the sector can be
> magically restored in the future.

This is off-topic, I know, so maybe we should continue this off-list,
if at all, but...

The OS could block the victim.  If Linux prefers to SIGBUS the victim,
well, that's Linux's fault, no?

> In the case of read() or similar, you are already in a system call and
> the kernel can return an error code which the application must already
> know how to handle.  In the case of mmap, what is interrupted is a
> processor-level memory access, and there is no provision for returning
> an error code; all that can be done is to reroute the entire control
> flow, and on Unixy systems that is done using signals.

Sure, EIO.  But certainly for mmap() page faults it's best to hang.

> Now, user code that can assume it controls the entire process _does_
> have the ability to establish a signal handler to fix up the access.

Not an option here.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Drake Wilson-4
Quoth Nico Williams <[hidden email]>, on 2013-04-04 19:15:52 -0500:
> This is off-topic, I know, so maybe we should continue this off-list,
> if at all, but...

Switching to private mail.

   ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite 3.7.17 preview - 2x faster?

Max Vlasov
In reply to this post by Richard Hipp-3
On Thu, Apr 4, 2013 at 4:02 PM, Richard Hipp <[hidden email]> wrote:

> By making use of memory-mapped I/O, the current trunk of SQLite (which will
> eventually become version 3.7.17 after much more refinement and testing)
> can be as much as twice as fast, on some platforms and under some
> workloads.  We would like to encourage people to try out the new code and
> report both success and failure.
>


Not particulary about this draft version, but about my experience with
memory mapped files on Windows If you don't mind .

When I worked with memory-mapped files on Windows two years ago, I
implemented a library for accessing files virtually unlimited in size with
sliding-view approach. There was an interesting effect affecting the system
as a whole. It's when  I write sequentially and starting some point the
system became unresponsive as a whole. This is an important point, not the
application that wrote to the file, the whole system, so no Alt-Tab, no
blinking caret in another application and sometimes even no mouse moving. I
tried to report and MS forums (
http://social.msdn.microsoft.com/Forums/en-US/windowsgeneraldevelopmentissues/thread/81dd029f-2f55-49f2-bd02-1a8ceb0373eb),
but seems like this wasn't noticed. I added a small procedure to show the
effect at the forum topic in pascal (it's sill there) that can easily be
ported to any other language supporting windows api directly.

Right now I tried to reproduce this while writing this message. The machine
is windows 64 bit 4 Gb memory. I started the program writing the the file
until 10Gb. And no surprise, at about 5-6 Gb, the notepad (another
application), stopped responding on my key presses, the caret stopped
blinking and Alt-tab and taskbar didn't work for about a minute. So I could
not do anything (!) on my computer for about minute or so while other
application did something using official documented API.

I don't know whether such scenario is possible with sqlite. Only that on
Windows memory-mapped files are still implemented as a very special entity,
sometimes having exclusively more permissions than other entities
regardless of permissions of the application that uses it. Probably I
should do some particular sqlite-specific tests to find out whether this
affects sqlite but before this I wanted to share this information.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
123