two threads block eachother opening db (WAL)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

two threads block eachother opening db (WAL)

Charles Samuels-2
I have found that when my process has a lot of threads each of which opens a
DIFFERENT database, they each block on eachother while opening each database.

Here is a little rust program to demonstrate the problem: <https://
pastebin.com/QGXFp6w6> If you simply remove the line "PRAGMA journal_mode =
WAL;", then the program runs more than 5x faster.

This is at least on conflict with the documentation, as the documentation
suggests that a thread is a process for the purposes of sqlite's concurrency,
at least when OPEN_NO_MUTEX is specified. I also feel like it's a bug in
general, as opening two unrelated databases should not block eachother.

Each thread blocks at this point:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff74e4b95 in __GI___pthread_mutex_lock (mutex=0x7ffff7dd8148
<staticMutexes.10049+360>)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x00007ffff7b26419 in unixEnterMutex () at sqlite3.c:31952
#3  unixLock (id=0x7ffff4a25180, eFileLock=1) at sqlite3.c:32894
#4  0x00007ffff7b1f2fa in sqlite3OsLock (lockType=1, id=<optimized out>) at
sqlite3.c:21299
#5  pagerLockDb (pPager=pPager@entry=0x7ffff4a25008, eLock=eLock@entry=1) at
sqlite3.c:50293
#6  0x00007ffff7b1f34b in pagerLockDb (eLock=1, pPager=0x7ffff4a25008) at
sqlite3.c:53054
#7  pager_wait_on_lock (pPager=pPager@entry=0x7ffff4a25008,
locktype=locktype@entry=1) at sqlite3.c:53051
#8  0x00007ffff7b55d6c in sqlite3PagerSharedLock (pPager=0x7ffff4a25008) at
sqlite3.c:54293
#9  0x00007ffff7b56835 in lockBtree (pBt=0x7ffff4a151e8) at sqlite3.c:64591
#10 sqlite3BtreeBeginTrans (p=0x7ffff4a1b508, wrflag=wrflag@entry=0) at
sqlite3.c:64956
#11 0x00007ffff7b82997 in sqlite3InitOne (db=0x7ffff4a12008, iDb=iDb@entry=0,
    pzErrMsg=pzErrMsg@entry=0x7ffff51fe778) at sqlite3.c:119558
#12 0x00007ffff7b82aca in sqlite3Init (db=0x7ffff4a12008,
pzErrMsg=pzErrMsg@entry=0x7ffff51fe778)
    at sqlite3.c:119740
#13 0x00007ffff7b82b00 in sqlite3ReadSchema
(pParse=pParse@entry=0x7ffff51fe770) at sqlite3.c:119765
#14 0x00007ffff7b8d8e4 in sqlite3Pragma (pParse=0x7ffff51fe770,
pId1=pId1@entry=0x7ffff51fddd0,
    pId2=pId2@entry=0x7ffff51fdde8, pValue=pValue@entry=0x7ffff51fde18,
minusFlag=minusFlag@entry=0)
    at sqlite3.c:117300
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: two threads block eachother opening db (WAL)

Rowan Worth-2
On 2 July 2018 at 23:32, Charles Samuels <[hidden email]> wrote:

> I have found that when my process has a lot of threads each of which opens
> a
> DIFFERENT database, they each block on eachother while opening each
> database.
>


> This is at least on conflict with the documentation, as the documentation
> suggests that a thread is a process for the purposes of sqlite's
> concurrency,


Actually I believe the mutex acquisition here is to provide those
semantics, because unix file locks are necessarily bound to a process and
not a thread. So sqlite must manage some of the per-thread details itself
to correctly handle the case where multiple threads are using the same DB,
possibly with different locking levels, and the mutex protects those shared
structures. Intuitively it seems like a mutex held on the sqlite3_file
structure rather than a global mutex could be used for this purpose which
would prevent unrelated databases from contesting each other, however...


> I also feel like it's a bug in
> general, as opening two unrelated databases should not block eachother.
>

... unix is something of a complex beast and the question of whether two
databases are unrelated is not as simple as it may seem. You might have two
databases /tmp/db and /var/tmp/db, but what if /var/tmp is a symlink to
/tmp? Or what if the two db files are hardlinks of the same inode?

Identifying multiple opens of the same inode is the other problem the mutex
is helping to solve here. And it's a detail sqlite _must_ be aware of
because of the bizarre POSIX locking semantics where closing a file
descriptor relinquishes all locks associated with its inode, across the
entire process. So if sqlite isn't careful about this, corruption can
easily result.

I'm not saying there isn't room for improvement here, but it's not a simple
problem!

Each thread blocks at this point:

>
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86
> _64/lowlevellock.S:135
> #1  0x00007ffff74e4b95 in __GI___pthread_mutex_lock (mutex=0x7ffff7dd8148
> <staticMutexes.10049+360>)
>     at ../nptl/pthread_mutex_lock.c:80
> #2  0x00007ffff7b26419 in unixEnterMutex () at sqlite3.c:31952
> #3  unixLock (id=0x7ffff4a25180, eFileLock=1) at sqlite3.c:32894
> #4  0x00007ffff7b1f2fa in sqlite3OsLock (lockType=1, id=<optimized out>)
> at
> sqlite3.c:21299
> #5  pagerLockDb (pPager=pPager@entry=0x7ffff4a25008, eLock=eLock@entry=1)
> at
> sqlite3.c:50293
> #6  0x00007ffff7b1f34b in pagerLockDb (eLock=1, pPager=0x7ffff4a25008) at
> sqlite3.c:53054
> #7  pager_wait_on_lock (pPager=pPager@entry=0x7ffff4a25008,
> locktype=locktype@entry=1) at sqlite3.c:53051
> #8  0x00007ffff7b55d6c in sqlite3PagerSharedLock (pPager=0x7ffff4a25008)
> at
> sqlite3.c:54293
> #9  0x00007ffff7b56835 in lockBtree (pBt=0x7ffff4a151e8) at sqlite3.c:64591
> #10 sqlite3BtreeBeginTrans (p=0x7ffff4a1b508, wrflag=wrflag@entry=0) at
> sqlite3.c:64956
> #11 0x00007ffff7b82997 in sqlite3InitOne (db=0x7ffff4a12008, iDb=iDb@entry=0,
>
>     pzErrMsg=pzErrMsg@entry=0x7ffff51fe778) at sqlite3.c:119558
> #12 0x00007ffff7b82aca in sqlite3Init (db=0x7ffff4a12008,
> pzErrMsg=pzErrMsg@entry=0x7ffff51fe778)
>     at sqlite3.c:119740
> #13 0x00007ffff7b82b00 in sqlite3ReadSchema
> (pParse=pParse@entry=0x7ffff51fe770) at sqlite3.c:119765
> #14 0x00007ffff7b8d8e4 in sqlite3Pragma (pParse=0x7ffff51fe770,
> pId1=pId1@entry=0x7ffff51fddd0,
>     pId2=pId2@entry=0x7ffff51fdde8, pValue=pValue@entry=0x7ffff51fde18,
> minusFlag=minusFlag@entry=0)
>     at sqlite3.c:117300
>

Hm, this backtrace doesn't seem to have anything to do with WAL mode, so
I'm not sure why using the rollback journal makes for a 5 time speedup.
Perhaps because the same mutex is used to protect shared-memory accesses?
sqlite3OSLock really doesn't do much with the mutex held; is there another
thread holding things up via a different codepath?

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users