SIGBUS errors with WAL mode and multiple simultaneous updating clients

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SIGBUS errors with WAL mode and multiple simultaneous updating clients

Brodie Thiesfield-8
Hi,

I seeing some SIGBUS faults during startup in the debug version of my
app, but only when running under valgrind, and only for some clients.
The faults appear to be occurring around the same location in the
sqlite WAL code. If I disable WAL then there are no faults. If I don't
run it under valgrind then there are no problems (so don't run it
under valgrind?). I don't know if this problem is caused by my code or
perhaps an issue in the sqlite WAL code caused by timing (since the
startup is much slower under valgrind).

The error occurs with a newly created database file containing a
number of tables and data. It is small -> about 80kb. I've upgraded
everything to latest versions and the problem persists, currently
using valgrind 3.6.1 and sqlite 3.7.7.1 (amalgamation). CentOS 5.6.
app built with gcc 4.4.4. I'm not using any sqlite compile time flags.

The app is a parent process that does a fork/exec of 5 children (same
results with different numbers of children). Each child runs some
simple test code mostly simultaneously. I find that usually 1 child
faults, sometimes more or less. The busy handler is usually called a
number of times for the clients, sometimes delaying for up to a few
seconds before retrying.

The parent may have multiple database connections open at the time of
the fork. They are closed in the child process before the exec. I
can't close them before that as they are in active use in the parent.
I've checked that there are no open file descriptors (other than those
that valgrind has) at the time of process startup.

If there is something that anyone can suggest to narrow down the
problem then I would be interested to hear it.

Regards,
Brodie


Some example stack traces:

==23787== Process terminating with default action of signal 7
(SIGBUS): dumping core
==23787==  Non-existent physical address at address 0x5D94068
==23787==    at 0x8484CAF: walIndexRecover (sqlite3.c:44817)
==23787==    by 0x8485DE2: walIndexReadHdr (sqlite3.c:45569)
==23787==    by 0x8485EC0: walTryBeginRead (sqlite3.c:45682)
==23787==    by 0x8486231: sqlite3WalBeginReadTransaction (sqlite3.c:45839)
==23787==    by 0x84802A6: pagerBeginReadTransaction (sqlite3.c:39822)
==23787==    by 0x84820A5: sqlite3PagerSharedLock (sqlite3.c:41678)
==23787==    by 0x848A0EC: lockBtree (sqlite3.c:49813)
==23787==    by 0x848A7AA: sqlite3BtreeBeginTrans (sqlite3.c:50105)
==23787==    by 0x84C637C: sqlite3InitOne (sqlite3.c:89829)
==23787==    by 0x84C678D: sqlite3Init (sqlite3.c:89998)
==23787==    by 0x84C6876: sqlite3ReadSchema (sqlite3.c:90035)
==23787==    by 0x84C3298: sqlite3Pragma (sqlite3.c:88616)
==23787==    by 0x84DF79A: yy_reduce (sqlite3.c:106258)
==23787==    by 0x84E00FF: sqlite3Parser (sqlite3.c:106641)
==23787==    by 0x84E0D13: sqlite3RunParser (sqlite3.c:107465)
==23787==    by 0x84C6BD5: sqlite3Prepare (sqlite3.c:90212)
==23787==    by 0x84C6ED4: sqlite3LockAndPrepare (sqlite3.c:90304)
==23787==    by 0x84C7043: sqlite3_prepare (sqlite3.c:90367)
==23787==    by 0x84C1A35: sqlite3_exec (sqlite3.c:86911)
==23787==    by 0x840F554:
cl::DatabaseSqlite::Connect(cl::StringBuffer<cl::NarrowTraits, 10u>
const&, int) (DatabaseSqlite.cpp:113)
==23787==    by 0x83F1BCC: cl::Database::InitDatabaseSqlite(void*)
(Database.cpp:631)
==23787==    by 0x83F07DA:
cl::Database::Init(boost::shared_ptr<cl::Configuration const>&)
(Database.cpp:465)
==23787==    by 0x83F03E7:
cl::Database::Connect(boost::shared_ptr<cl::Configuration const>&,
boost::shared_ptr<cl::Database>&) (Database.cpp:382)
==23787==    by 0x82CDE08: clsoapmain(int, char**) (clsoapMain.cpp:306)
==23787==    by 0x82DA099: main (clsoapUnix.cpp:32)



==23771== Process terminating with default action of signal 7
(SIGBUS): dumping core
==23771==  Non-existent physical address at address 0x5D9B686
==23771==    at 0x84863B0: sqlite3WalRead (sqlite3.c:45931)
==23771==    by 0x847FF1D: readDbPage (sqlite3.c:39604)
==23771==    by 0x84822AF: sqlite3PagerAcquire (sqlite3.c:41841)
==23771==    by 0x848929B: btreeGetPage (sqlite3.c:49066)
==23771==    by 0x848938E: getAndInitPage (sqlite3.c:49119)
==23771==    by 0x848C586: moveToChild (sqlite3.c:51633)
==23771==    by 0x848C8EA: moveToLeftmost (sqlite3.c:51798)
==23771==    by 0x848D2E3: sqlite3BtreeNext (sqlite3.c:52180)
==23771==    by 0x848D2A7: sqlite3BtreeNext (sqlite3.c:52170)
==23771==    by 0x84A2469: sqlite3VdbeExec (sqlite3.c:67148)
==23771==    by 0x849ABBF: sqlite3Step (sqlite3.c:61204)
==23771==    by 0x849ADAB: sqlite3_step (sqlite3.c:61277)
==23771==    by 0x8410AEB: cl::RequestSqlite::Step() (DatabaseSqlite.cpp:566)
==23771==    by 0x83F5777: cl::Database::LoadStrings(char,
std::vector<cl::StringLocalCache::Entry,
std::allocator<cl::StringLocalCache::Entry> >&) (Database.cpp:1623)
==23771==    by 0x83F0C91: cl::Database::TestDatabaseStrings()
(Database.cpp:532)
==23771==    by 0x83F0A96:
cl::Database::Init(boost::shared_ptr<cl::Configuration const>&)
(Database.cpp:508)
==23771==    by 0x83F03E7:
cl::Database::Connect(boost::shared_ptr<cl::Configuration const>&,
boost::shared_ptr<cl::Database>&) (Database.cpp:382)
==23771==    by 0x82CDE08: clsoapmain(int, char**) (clsoapMain.cpp:306)
==23771==    by 0x82DA099: main (clsoapUnix.cpp:32)
==23771==



==24314== Process terminating with default action of signal 7
(SIGBUS): dumping core
==24314==  Non-existent physical address at address 0x5D9A0EA
==24314==    at 0x8486D14: sqlite3WalRead (sqlite3.c:45931)
==24314==    by 0x8480881: readDbPage (sqlite3.c:39604)
==24314==    by 0x8482C13: sqlite3PagerAcquire (sqlite3.c:41841)
==24314==    by 0x8489BFF: btreeGetPage (sqlite3.c:49066)
==24314==    by 0x8489CF2: getAndInitPage (sqlite3.c:49119)
==24314==    by 0x848CEEA: moveToChild (sqlite3.c:51633)
==24314==    by 0x848D24E: moveToLeftmost (sqlite3.c:51798)
==24314==    by 0x848D390: sqlite3BtreeFirst (sqlite3.c:51850)
==24314==    by 0x84A2CD4: sqlite3VdbeExec (sqlite3.c:67086)
==24314==    by 0x849B523: sqlite3Step (sqlite3.c:61204)
==24314==    by 0x849B70F: sqlite3_step (sqlite3.c:61277)
==24314==    by 0x84C23D8: sqlite3_exec (sqlite3.c:86927)
==24314==    by 0x84C6F87: sqlite3InitOne (sqlite3.c:89932)
==24314==    by 0x84C70F1: sqlite3Init (sqlite3.c:89998)
==24314==    by 0x84C71DA: sqlite3ReadSchema (sqlite3.c:90035)
==24314==    by 0x84C3BFC: sqlite3Pragma (sqlite3.c:88616)
==24314==    by 0x84E00FE: yy_reduce (sqlite3.c:106258)
==24314==    by 0x84E0A63: sqlite3Parser (sqlite3.c:106641)
==24314==    by 0x84E1677: sqlite3RunParser (sqlite3.c:107465)
==24314==    by 0x84C7539: sqlite3Prepare (sqlite3.c:90212)
==24314==    by 0x84C7838: sqlite3LockAndPrepare (sqlite3.c:90304)
==24314==    by 0x84C79A7: sqlite3_prepare (sqlite3.c:90367)
==24314==    by 0x84C2399: sqlite3_exec (sqlite3.c:86911)
==24314==    by 0x840F5D4:
cl::DatabaseSqlite::Connect(cl::StringBuffer<cl::NarrowTraits, 10u>
const&, int) (DatabaseSqlite.cpp:113)
==24314==    by 0x83F1BCC: cl::Database::InitDatabaseSqlite(void*)
(Database.cpp:631)
==24314==    by 0x83F07DA:
cl::Database::Init(boost::shared_ptr<cl::Configuration const>&)
(Database.cpp:465)
==24314==    by 0x83F03E7:
cl::Database::Connect(boost::shared_ptr<cl::Configuration const>&,
boost::shared_ptr<cl::Database>&) (Database.cpp:382)
==24314==    by 0x82CDE08: clsoapmain(int, char**) (clsoapMain.cpp:306)
==24314==    by 0x82DA099: main (clsoapUnix.cpp:32)



==24318== Process terminating with default action of signal 7
(SIGBUS): dumping core
==24318==  Non-existent physical address at address 0x5D9A0EA
==24318==    at 0x8486D14: sqlite3WalRead (sqlite3.c:45931)
==24318==    by 0x8480881: readDbPage (sqlite3.c:39604)
==24318==    by 0x8482C13: sqlite3PagerAcquire (sqlite3.c:41841)
==24318==    by 0x8489BFF: btreeGetPage (sqlite3.c:49066)
==24318==    by 0x8489CF2: getAndInitPage (sqlite3.c:49119)
==24318==    by 0x848CEEA: moveToChild (sqlite3.c:51633)
==24318==    by 0x848D24E: moveToLeftmost (sqlite3.c:51798)
==24318==    by 0x848D390: sqlite3BtreeFirst (sqlite3.c:51850)
==24318==    by 0x84A2CD4: sqlite3VdbeExec (sqlite3.c:67086)
==24318==    by 0x849B523: sqlite3Step (sqlite3.c:61204)
==24318==    by 0x849B70F: sqlite3_step (sqlite3.c:61277)
==24318==    by 0x84C23D8: sqlite3_exec (sqlite3.c:86927)
==24318==    by 0x84C6F87: sqlite3InitOne (sqlite3.c:89932)
==24318==    by 0x84C70F1: sqlite3Init (sqlite3.c:89998)
==24318==    by 0x84C71DA: sqlite3ReadSchema (sqlite3.c:90035)
==24318==    by 0x84C3BFC: sqlite3Pragma (sqlite3.c:88616)
==24318==    by 0x84E00FE: yy_reduce (sqlite3.c:106258)
==24318==    by 0x84E0A63: sqlite3Parser (sqlite3.c:106641)
==24318==    by 0x84E1677: sqlite3RunParser (sqlite3.c:107465)
==24318==    by 0x84C7539: sqlite3Prepare (sqlite3.c:90212)
==24318==    by 0x84C7838: sqlite3LockAndPrepare (sqlite3.c:90304)
==24318==    by 0x84C79A7: sqlite3_prepare (sqlite3.c:90367)
==24318==    by 0x84C2399: sqlite3_exec (sqlite3.c:86911)
==24318==    by 0x840F5D4:
cl::DatabaseSqlite::Connect(cl::StringBuffer<cl::NarrowTraits, 10u>
const&, int) (DatabaseSqlite.cpp:113)
==24318==    by 0x83F1BCC: cl::Database::InitDatabaseSqlite(void*)
(Database.cpp:631)
==24318==    by 0x83F07DA:
cl::Database::Init(boost::shared_ptr<cl::Configuration const>&)
(Database.cpp:465)
==24318==    by 0x83F03E7:
cl::Database::Connect(boost::shared_ptr<cl::Configuration const>&,
boost::shared_ptr<cl::Database>&) (Database.cpp:382)
==24318==    by 0x82CDE08: clsoapmain(int, char**) (clsoapMain.cpp:306)
==24318==    by 0x82DA099: main (clsoapUnix.cpp:32)
==24318==


==24852== Process terminating with default action of signal 7
(SIGBUS): dumping core
==24852==  Non-existent physical address at address 0x5D94068
==24852==    at 0x8486AD3: walTryBeginRead (sqlite3.c:45807)
==24852==    by 0x8486B85: sqlite3WalBeginReadTransaction (sqlite3.c:45839)
==24852==    by 0x8480BFA: pagerBeginReadTransaction (sqlite3.c:39822)
==24852==    by 0x84829F9: sqlite3PagerSharedLock (sqlite3.c:41678)
==24852==    by 0x848AA40: lockBtree (sqlite3.c:49813)
==24852==    by 0x848B0FE: sqlite3BtreeBeginTrans (sqlite3.c:50105)
==24852==    by 0x84C6CD0: sqlite3InitOne (sqlite3.c:89829)
==24852==    by 0x84C70E1: sqlite3Init (sqlite3.c:89998)
==24852==    by 0x84C71CA: sqlite3ReadSchema (sqlite3.c:90035)
==24852==    by 0x84C3BEC: sqlite3Pragma (sqlite3.c:88616)
==24852==    by 0x84E00EE: yy_reduce (sqlite3.c:106258)
==24852==    by 0x84E0A53: sqlite3Parser (sqlite3.c:106641)
==24852==    by 0x84E1667: sqlite3RunParser (sqlite3.c:107465)
==24852==    by 0x84C7529: sqlite3Prepare (sqlite3.c:90212)
==24852==    by 0x84C7828: sqlite3LockAndPrepare (sqlite3.c:90304)
==24852==    by 0x84C7997: sqlite3_prepare (sqlite3.c:90367)
==24852==    by 0x84C2389: sqlite3_exec (sqlite3.c:86911)
==24852==    by 0x840F5D4:
cl::DatabaseSqlite::Connect(cl::StringBuffer<cl::NarrowTraits, 10u>
const&, int) (DatabaseSqlite.cpp:113)
==24852==    by 0x83F1BCC: cl::Database::InitDatabaseSqlite(void*)
(Database.cpp:631)
==24852==    by 0x83F07DA:
cl::Database::Init(boost::shared_ptr<cl::Configuration const>&)
(Database.cpp:465)
==24852==    by 0x83F03E7:
cl::Database::Connect(boost::shared_ptr<cl::Configuration const>&,
boost::shared_ptr<cl::Database>&) (Database.cpp:382)
==24852==    by 0x82CDE08: clsoapmain(int, char**) (clsoapMain.cpp:306)
==24852==    by 0x82DA099: main (clsoapUnix.cpp:32)


==24809== Process terminating with default action of signal 7
(SIGBUS): dumping core
==24809==  Non-existent physical address at address 0x5D94000
==24809==    at 0x8486496: walIndexTryHdr (sqlite3.c:45489)
==24809==    by 0x8486656: walIndexReadHdr (sqlite3.c:45548)
==24809==    by 0x8486814: walTryBeginRead (sqlite3.c:45682)
==24809==    by 0x8486B85: sqlite3WalBeginReadTransaction (sqlite3.c:45839)
==24809==    by 0x8480BFA: pagerBeginReadTransaction (sqlite3.c:39822)
==24809==    by 0x84829F9: sqlite3PagerSharedLock (sqlite3.c:41678)
==24809==    by 0x848AA40: lockBtree (sqlite3.c:49813)
==24809==    by 0x848B0FE: sqlite3BtreeBeginTrans (sqlite3.c:50105)
==24809==    by 0x84A0B66: sqlite3VdbeExec (sqlite3.c:65614)
==24809==    by 0x849B513: sqlite3Step (sqlite3.c:61204)
==24809==    by 0x849B6FF: sqlite3_step (sqlite3.c:61277)
==24809==    by 0x8410D77: cl::RequestSqlite::Step() (DatabaseSqlite.cpp:566)
==24809==    by 0x83F08DA:
cl::Database::Init(boost::shared_ptr<cl::Configuration const>&)
(Database.cpp:486)
==24809==    by 0x83F03E7:
cl::Database::Connect(boost::shared_ptr<cl::Configuration const>&,
boost::shared_ptr<cl::Database>&) (Database.cpp:382)
==24809==    by 0x82CDE08: clsoapmain(int, char**) (clsoapMain.cpp:306)
==24809==    by 0x82DA099: main (clsoapUnix.cpp:32)


The code being executed is something like:

   if (0 != strcmp(SQLITE_VERSION, sqlite3_libversion())) ...
   int rc = sqlite3_open_v2(pszDatabase, &m_pDatabase,
SQLITE_OPEN_READWRITE, NULL);
   rc = sqlite3_busy_handler(m_pDatabase, BusyCallback, this);
   journalMode = usewal ?
        "PRAGMA journal_mode=WAL;" :
        "PRAGMA journal_mode=DELETE;";
   rc = sqlite3_exec(m_pDatabase, journalMode, 0, 0, 0);
   rc = sqlite3_exec(m_pDatabase, "PRAGMA temp_store = MEMORY;", 0, 0, 0);
   rc = sqlite3_exec(m_pDatabase, "PRAGMA synchronous = NORMAL;", 0, 0, 0);
   rc = sqlite3_exec(m_pDatabase, "PRAGMA case_sensitive_like = ON;", 0, 0, 0);

   sqlite3_prepare_v2(SELECT version, dateapplied FROM
schema_changelog WHERE id = (SELECT MAX(id) FROM schema_changelog);) =
0
   step, column_text, finalize

   sqlite3_exec(BEGIN IMMEDIATE TRANSACTION;) = 0

   sqlite3_prepare_v2(SELECT sid, lang, txt FROM strings WHERE stype = ?;) = 0
   read about 128 rows out via bind_text, step, column_text
   sqlite3_reset() = 0

   sqlite3_prepare_v2(DELETE FROM strings WHERE stype = ? AND sid = ?;) = 0
   bind, step, reset

   sqlite3_prepare_v2(INSERT INTO strings (stype, sid, lang, txt)
VALUES (?, ?, ?, ?);) = 0
   loop about 4 times with bind, step then at end reset

  delete from strings the few rows just inserted using the prepared
statement above
  using bind, step, reset

  sqlite3_exec(COMMIT TRANSACTION) = 0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS errors with WAL mode and multiple simultaneous updating clients

avinash.jha2493
Was this resolved?



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS errors with WAL mode and multiple simultaneous updating clients

Richard Hipp-3
On 12/2/19, avinash.jha2493 <[hidden email]> wrote:
> Was this resolved?

What what resolved?

There are no known issues in SQLite's WAL mode.  In fact, there are no
known segfault issues with SQLite.  Perhaps there is a problem with
tensorflow, but we don't have anything to do with that - you will need
to discuss this with the TF people.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users