race condition?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

race condition?

Lei Chen
Hi experts,

I'm debugging a tricky issue related to sqlite(3.9.2) database access. This
is on Linux 3.2 kernel. Basically, when the failure occurs, there are two
processes accessing the same -journal file concurrently, see below log.
When daemon scsitgtd wants to "commit" a transaction, it finds that the
-journal file was already deleted by someone(actually, it's procmon daemon,
which needs to access the same database to retrieve some info).

This issue happens intermittently, having something to do with timing.

Having studied the sqlite code and documents, I think the database file
should have been *locked *when the -journal is created, upon "commit". In
theory, no other processes could have got the lock and "rollback"ed the
hot-journal. However, we did see procmon slipped in. Does anybody know if
this is a known issue in the old sqlite version? Or how can I continue to
debug the lock contention issue?









*>>> 1. "joirnal" file is detectedSep 18 03:34:23 procmon: INFO: Hot
journal detected: /registry/m0/scsitgtd.db3-journalSep 18 03:34:23 procmon:
INFO: SQLITE: rc=539, recovered 9 pages from
/registry/m0/scsitgtd.db3-journal>>> 2. commit failed because "journal"
file is missing Sep 18 03:34:23 scsitgtd[26949]: ERROR: Registry
/registry/m0/scsitgtd.db3 exec("commit") error 5898: disk I/O error
(retries=0)Sep 18 03:34:23 scsitgtd[26949]: INFO: SQLITE: rc=1, statement
aborts at 2: [rollback] cannot rollback - no transaction is activeSep 18
03:34:23 scsitgtd[26949]: ERROR: Registry /registry/m0/scsitgtd.db3
exec("rollback") error 1: cannot rollback - no transaction is active
(retries=0)Sep 18 03:34:23 scsitgtd[26949]: ERROR:  Error 5551 committing
transaction: SQLite error 5898 on registry /registry/m0/scsitgtd.db3 during
exec("commit")*

Please copy me when you kindly reply.

Thanks,
Lei Chen
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: race condition?

Dan Kennedy-4
On 10/29/2018 03:45 PM, Lei Chen wrote:

> Hi experts,
>
> I'm debugging a tricky issue related to sqlite(3.9.2) database access. This
> is on Linux 3.2 kernel. Basically, when the failure occurs, there are two
> processes accessing the same -journal file concurrently, see below log.
> When daemon scsitgtd wants to "commit" a transaction, it finds that the
> -journal file was already deleted by someone(actually, it's procmon daemon,
> which needs to access the same database to retrieve some info).
>
> This issue happens intermittently, having something to do with timing.
>
> Having studied the sqlite code and documents, I think the database file
> should have been *locked *when the -journal is created, upon "commit". In
> theory, no other processes could have got the lock and "rollback"ed the
> hot-journal. However, we did see procmon slipped in. Does anybody know if
> this is a known issue in the old sqlite version? Or how can I continue to
> debug the lock contention issue?

Not a known issue.

There are some common problems regarding locking enumerated here:

   https://www.sqlite.org/howtocorrupt.html#_file_locking_problems

In practice, the ones in sections 2.2 and 2.2.1 seem to come up most often.

Dan.



>
>
>
>
>
>
>
>
>
> *>>> 1. "joirnal" file is detectedSep 18 03:34:23 procmon: INFO: Hot
> journal detected: /registry/m0/scsitgtd.db3-journalSep 18 03:34:23 procmon:
> INFO: SQLITE: rc=539, recovered 9 pages from
> /registry/m0/scsitgtd.db3-journal>>> 2. commit failed because "journal"
> file is missing Sep 18 03:34:23 scsitgtd[26949]: ERROR: Registry
> /registry/m0/scsitgtd.db3 exec("commit") error 5898: disk I/O error
> (retries=0)Sep 18 03:34:23 scsitgtd[26949]: INFO: SQLITE: rc=1, statement
> aborts at 2: [rollback] cannot rollback - no transaction is activeSep 18
> 03:34:23 scsitgtd[26949]: ERROR: Registry /registry/m0/scsitgtd.db3
> exec("rollback") error 1: cannot rollback - no transaction is active
> (retries=0)Sep 18 03:34:23 scsitgtd[26949]: ERROR:  Error 5551 committing
> transaction: SQLite error 5898 on registry /registry/m0/scsitgtd.db3 during
> exec("commit")*
>
> Please copy me when you kindly reply.
>
> Thanks,
> Lei Chen
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: race condition?

Lei Chen
In reply to this post by Lei Chen
Thanks Dan. I just found another report regarding the hot journal,
"Hot-Journal with VFS",
https://www.mail-archive.com/sqlite-users@.../msg112377.html.
It seems the same issue as I hit.

Thanks,
Lei Chen

Lei Chen <[hidden email]> 于2018年10月29日周一 下午4:45写道:

> Hi experts,
>
> I'm debugging a tricky issue related to sqlite(3.9.2) database access.
> This is on Linux 3.2 kernel. Basically, when the failure occurs, there are
> two processes accessing the same -journal file concurrently, see below log.
> When daemon scsitgtd wants to "commit" a transaction, it finds that the
> -journal file was already deleted by someone(actually, it's procmon daemon,
> which needs to access the same database to retrieve some info).
>
> This issue happens intermittently, having something to do with timing.
>
> Having studied the sqlite code and documents, I think the database file
> should have been *locked *when the -journal is created, upon "commit". In
> theory, no other processes could have got the lock and "rollback"ed the
> hot-journal. However, we did see procmon slipped in. Does anybody know if
> this is a known issue in the old sqlite version? Or how can I continue to
> debug the lock contention issue?
>
>
>
>
>
>
>
>
>
> *>>> 1. "joirnal" file is detectedSep 18 03:34:23 procmon: INFO: Hot
> journal detected: /registry/m0/scsitgtd.db3-journalSep 18 03:34:23 procmon:
> INFO: SQLITE: rc=539, recovered 9 pages from
> /registry/m0/scsitgtd.db3-journal>>> 2. commit failed because "journal"
> file is missing Sep 18 03:34:23 scsitgtd[26949]: ERROR: Registry
> /registry/m0/scsitgtd.db3 exec("commit") error 5898: disk I/O error
> (retries=0)Sep 18 03:34:23 scsitgtd[26949]: INFO: SQLITE: rc=1, statement
> aborts at 2: [rollback] cannot rollback - no transaction is activeSep 18
> 03:34:23 scsitgtd[26949]: ERROR: Registry /registry/m0/scsitgtd.db3
> exec("rollback") error 1: cannot rollback - no transaction is active
> (retries=0)Sep 18 03:34:23 scsitgtd[26949]: ERROR:  Error 5551 committing
> transaction: SQLite error 5898 on registry /registry/m0/scsitgtd.db3 during
> exec("commit")*
>
> Please copy me when you kindly reply.
>
> Thanks,
> Lei Chen
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users