analysis of a corrupt db

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

analysis of a corrupt db

Adam DeVita
Good day,

I've got a case of a corrupt file on some hardware of our own design,
a linux based IO controller in a harsh environment.

It was lately discovered that 1 controller in a field test group had a
corrupt db on it, so naturally we are attempting to figure out what
happened.

The hardware has the db on flash memory.

DB Description:
For the sake of documenting it, Db Size is about 370KB
It is used as a status scoreboard for various system configuration information.

1 table only:
CREATE TABLE config( id text PRIMARY KEY,file text,xpath text, value
text, venc_switch_xpath text, apply_cmd text, cacheIsDirty integer
default -1 );

Under normal operation all access is controlled by 1 program that
serializes requests from the rest of the system, and executes batches
of statements in a transaction. Under normal operation only SELECT and
UPDATE queries are run.

The db doesn't grow in number of records. There are (always at this
firmware version) 1455 rows in a good db.

Under upgrade, the above db management program is shut down and the
upgrade script runs commands through a shell tool. Under upgrade we do
use INSERT OR REPLACE as well as update. Upgrades are normally
executed by creating a new db with default values (and inserting the
list of known ids)  and then attaching the new db to the old one and
replacing records into the new db that have non-default values.

Shell Tool Observations:
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ”.help” for usage hints.
sqlite> pragma integrity_check;
Error: database disk image is malformed
sqlite> .tables
sqlite> .schema
Error: database disk image is malformed
sqlite> select * from sqlite_master;
Error: database disk image is malformed
sqlite>.dump
...eventually
INSERT INTO "config" VALUES('gforce_orientation','good
_data','more_good_data','',NULL,NULL,0);
INSERT INTO "config"
VALUES('audio_input_gain','mygood_path1','alsogood_data','',NULL,'good
text data',0);
/**** ERROR: (11) database disk image is malformed ***/
/** ERROR: (11) database disk image is malformed *****/
COMMIT;

Other hacks at it:
Inspecting the file and from the above with a comparison to a known
good file the headers appear ok. The table exists but our code returns
this db is corrupt.
Using a hex editor to manually inspect the file with a comparison to a
known good one shows that there is no data that isn't "db-ish" : This
is not a case of rogue data being written to the file, as far as I can
see.

Is there another utility I can use to help point at the problem?
How is .dump working to print out almost everything when .tables
returns the db is corrupt?

I'd like to attempt to figure out what the last bit of data written in was.

If I .dump into a text file, then open a new db and .read into it, I
get 1454 records (1 fewer than the 'good db')
Comparing to the good file, I know that  audio_output_gain is the
record that is not printed by the .dump.
Does it follow that it must be the corrupt record?
How would that prevent .table or .schema from getting read?

From the values of the cacheIsDirty flag, I deduce that it was in the
process of an upgrade, not normal user interaction, when the
corruption occurred. (This does not conclusively point to if the error
happened during the upgrade, or immediately after it as the normal
mode works through the records with 'dirty' cache.  That said, all
~200 records of 'dirty' cache should be updated in 1 transaction, so 1
record being wrong seems to not fit. )


regards,
Adam DeVita


BTW: While testing this, I noticed that if I ftp the file to the
device from win 7 command prompt ftp to the linux box without setting
to bin (leaving in ascii mode), that will corrupt the db. That is a
simple move to corrupt that isn't listed on
(https://www.sqlite.org/howtocorrupt.html ).

--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: analysis of a corrupt db

Simon Slavin-3

On 12 Jan 2016, at 2:52pm, Adam Devita <[hidden email]> wrote:

> Under normal operation all access is controlled by 1 program that
> serializes requests from the rest of the system, and executes batches
> of statements in a transaction. Under normal operation only SELECT and
> UPDATE queries are run.

Does your program examine the codes returned by SQLite3 calls and check to see that they are all returning SQLITE_OK ?

> The db doesn't grow in number of records. There are (always at this
> firmware version) 1455 rows in a good db.
>
> Under upgrade, the above db management program is shut down and the
> upgrade script runs commands through a shell tool.

Does the script shut down the program and wait for the program to quit before it starts running its own commands, or are the two things done independently ?

Are both programs running on the computer with the database stored on a hard disk, or is anything accessing the database across a network ?

> Under upgrade we do
> use INSERT OR REPLACE as well as update. Upgrades are normally
> executed by creating a new db with default values (and inserting the
> list of known ids)  and then attaching the new db to the old one and
> replacing records into the new db that have non-default values.

Does /this/ program examine the values returned by SQLite calls to see that they're all SQLITE_OK ?

Just as a closing comment, I note that almost all the cases of genuine database corruption I've seen are caused by faulty hardware.  You've obviously read the "How to corrupt" document and that's what's left: hardware.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: analysis of a corrupt db

Richard Hipp-3
In reply to this post by Adam DeVita
On 1/12/16, Adam Devita <[hidden email]> wrote:
>
> Shell Tool Observations:
> SQLite version 3.8.4.3 2014-04-03 16:53:12
> Enter ”.help” for usage hints.

If you first do:  ".log stdout" before doing the "PRAGMA
integrity_check", you might get some better diagnostics.  Or maybe
not.  In any event, it doesn't hurt to try.

> sqlite> pragma integrity_check;
> Error: database disk image is malformed
>

Other things to try:

    ./configure; make showdb;
    ./showdb your-corrupt-db-file.db dbheader
    ./showdb your-corrupt-db-file.db pgidx

There is a lot of other things you can do with the showdb program.
Type "./showdb" with no argument for a very terse summary.  I, for
one, would be very interested in seeing the output of the above two
commands.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: analysis of a corrupt db

Adam DeVita
Good day,
Thank you for some avenues of investigation.

Q: Does your program examine the codes returned by SQLite3 calls and
check to see that they are all returning SQLITE_OK ?

A1: The upgrade process is done by a script. It isn't error checking &
executes queries via the shell tool.  I followed up with the script
folks and they got  a pull of the file system log from the failed
unit.  There are 2 queries that run to update new.db from data in
old.db, and appear to execute and update multiple rows each.

A2: From inspecting the normal op program (at the version of the
firmware tag)  It looks fairly good.
The only return code not checked is a on  commit/rollback that is
after a failed prepare statement or one particular function that takes
a checked input, performs bind, step, reset at one point.
There may be a vulnerability at one point to sqlite bind text that has
a null input for the string. The docs say such an oversight would
return SQLITE_MISUSE so the step would not happen.

Q: Does the script shut down the program and wait for the program to
quit before it starts running its own commands, or are the two things
done independently ?

A: The script does not check for a response but it waits. It would be
very unusual for their to be pending writes to the old db (or user
requests) while new db is extracting the information. I suppose WAL
mode would be safer here.

The upgrade Script doesn't check values of the returned by the shell
tool. That said. Since the alg is :update new.db from old.db, replace
old.db file with new.db. New.db's default value for cache is "reload
value from config files", the script failure would leave new.db in a
state that, as long as not corrupt, would simply reload from the
config files.


Q: Are both programs running on the computer with the database stored
on a hard disk, or is anything accessing the database across a network
?
A: All operations happen on locally stored flash or memory only. (The
new.db in an update is uncompressed to ram, once updates are complete,
then it gets copied to the local flash.)
The only network ops are "upload upgrade package to remote device,
then tell it to use it". There are no sqlite operations over a
network.


--As per DRH's instruction----------
f:\Users\Adam>sqlite3.exe system.bad
SQLite version 3.10.0 2016-01-06 11:01:07
Enter ".help" for usage hints.
sqlite> .log stdout
sqlite> pragma integrity_check;
(11) database corruption at line 58034 of [fd0a50f079]
(11) database disk image is malformed
Error: database disk image is malformed
sqlite>
---------------------

showdb: I assume this is a linux tool?  Where would I pull that from?
Our device doesn't have all the utilities but I can put a copy of the
bad db on a development linux environment for further tests.

On Tue, Jan 12, 2016 at 10:55 AM, Richard Hipp <[hidden email]> wrote:

> On 1/12/16, Adam Devita <[hidden email]> wrote:
>>
>> Shell Tool Observations:
>> SQLite version 3.8.4.3 2014-04-03 16:53:12
>> Enter ”.help” for usage hints.
>
> If you first do:  ".log stdout" before doing the "PRAGMA
> integrity_check", you might get some better diagnostics.  Or maybe
> not.  In any event, it doesn't hurt to try.
>
>> sqlite> pragma integrity_check;
>> Error: database disk image is malformed
>>
>
> Other things to try:
>
>     ./configure; make showdb;
>     ./showdb your-corrupt-db-file.db dbheader
>     ./showdb your-corrupt-db-file.db pgidx
>
> There is a lot of other things you can do with the showdb program.
> Type "./showdb" with no argument for a very terse summary.  I, for
> one, would be very interested in seeing the output of the above two
> commands.
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
--------------
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: analysis of a corrupt db

Adam DeVita
Some more information:

A co-worker managed to get an copy of the db by as interpreted by
jffs2dump of the file system, that was extracted by the jffs2dump
python script (from git hub). It is interesting that it is also
corrupt but in a different way.

sqlite> select  * from config where id ='isp_de_mode';
(11) database corruption at line 70244 of [fd0a50f079]
(11) statement aborts at 9: [select  * from config where id
='isp_de_mode';] database disk image is malformed
Error: database disk image is malformed
sqlite> select  id from config where id ='isp_de_mode';
isp_de_mode
sqlite> select count(id) from config where id ='isp_de_mode';
1
sqlite> select count(1), id from config group by id order by 1 desc limit 2;
2|isp_de_mode
1|audio_bitrate_s0
sqlite> .schema
CREATE TABLE config( id varchar(255) PRIMARY KEY,file
varchar(255),xpath varchar (255), value varchar(255),
venc_switch_xpath varchar(255), apply_cmd varchar(255), cacheIsDirty
integer default -1  );
sqlite>
sqlite> pragma integrity_check;
row 1275 missing from index sqlite_autoindex_config_1
row 1276 missing from index sqlite_autoindex_config_1
row 1277 missing from index sqlite_autoindex_config_1
row 1346 missing from index sqlite_autoindex_config_1
row 1347 missing from index sqlite_autoindex_config_1
row 1348 missing from index sqlite_autoindex_config_1
row 1349 missing from index sqlite_autoindex_config_1
row 1350 missing from index sqlite_autoindex_config_1
row 1351 missing from index sqlite_autoindex_config_1
row 1352 missing from index sqlite_autoindex_config_1
row 1353 missing from index sqlite_autoindex_config_1
row 1354 missing from index sqlite_autoindex_config_1
row 1367 missing from index sqlite_autoindex_config_1
row 1372 missing from index sqlite_autoindex_config_1
wrong # of entries in index sqlite_autoindex_config_1

regards,
Adam



On Tue, Jan 12, 2016 at 12:01 PM, Adam Devita <[hidden email]> wrote:

> Good day,
> Thank you for some avenues of investigation.
>
> Q: Does your program examine the codes returned by SQLite3 calls and
> check to see that they are all returning SQLITE_OK ?
>
> A1: The upgrade process is done by a script. It isn't error checking &
> executes queries via the shell tool.  I followed up with the script
> folks and they got  a pull of the file system log from the failed
> unit.  There are 2 queries that run to update new.db from data in
> old.db, and appear to execute and update multiple rows each.
>
> A2: From inspecting the normal op program (at the version of the
> firmware tag)  It looks fairly good.
> The only return code not checked is a on  commit/rollback that is
> after a failed prepare statement or one particular function that takes
> a checked input, performs bind, step, reset at one point.
> There may be a vulnerability at one point to sqlite bind text that has
> a null input for the string. The docs say such an oversight would
> return SQLITE_MISUSE so the step would not happen.
>
> Q: Does the script shut down the program and wait for the program to
> quit before it starts running its own commands, or are the two things
> done independently ?
>
> A: The script does not check for a response but it waits. It would be
> very unusual for their to be pending writes to the old db (or user
> requests) while new db is extracting the information. I suppose WAL
> mode would be safer here.
>
> The upgrade Script doesn't check values of the returned by the shell
> tool. That said. Since the alg is :update new.db from old.db, replace
> old.db file with new.db. New.db's default value for cache is "reload
> value from config files", the script failure would leave new.db in a
> state that, as long as not corrupt, would simply reload from the
> config files.
>
>
> Q: Are both programs running on the computer with the database stored
> on a hard disk, or is anything accessing the database across a network
> ?
> A: All operations happen on locally stored flash or memory only. (The
> new.db in an update is uncompressed to ram, once updates are complete,
> then it gets copied to the local flash.)
> The only network ops are "upload upgrade package to remote device,
> then tell it to use it". There are no sqlite operations over a
> network.
>
>
> --As per DRH's instruction----------
> f:\Users\Adam>sqlite3.exe system.bad
> SQLite version 3.10.0 2016-01-06 11:01:07
> Enter ".help" for usage hints.
> sqlite> .log stdout
> sqlite> pragma integrity_check;
> (11) database corruption at line 58034 of [fd0a50f079]
> (11) database disk image is malformed
> Error: database disk image is malformed
> sqlite>
> ---------------------
>
> showdb: I assume this is a linux tool?  Where would I pull that from?
> Our device doesn't have all the utilities but I can put a copy of the
> bad db on a development linux environment for further tests.
>
> On Tue, Jan 12, 2016 at 10:55 AM, Richard Hipp <[hidden email]> wrote:
>> On 1/12/16, Adam Devita <[hidden email]> wrote:
>>>
>>> Shell Tool Observations:
>>> SQLite version 3.8.4.3 2014-04-03 16:53:12
>>> Enter ”.help” for usage hints.
>>
>> If you first do:  ".log stdout" before doing the "PRAGMA
>> integrity_check", you might get some better diagnostics.  Or maybe
>> not.  In any event, it doesn't hurt to try.
>>
>>> sqlite> pragma integrity_check;
>>> Error: database disk image is malformed
>>>
>>
>> Other things to try:
>>
>>     ./configure; make showdb;
>>     ./showdb your-corrupt-db-file.db dbheader
>>     ./showdb your-corrupt-db-file.db pgidx
>>
>> There is a lot of other things you can do with the showdb program.
>> Type "./showdb" with no argument for a very terse summary.  I, for
>> one, would be very interested in seeing the output of the above two
>> commands.
>>
>> --
>> D. Richard Hipp
>> [hidden email]
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> --
> --------------
> VerifEye Technologies Inc.
> 151 Whitehall Dr. Unit 2
> Markham, ON
> L3R 9T1



--
--------------
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: analysis of a corrupt db

David Woodhouse
On Tue, 2016-01-12 at 12:18 -0500, Adam Devita wrote:
>
> A co-worker managed to get an copy of the db by as interpreted by
> jffs2dump of the file system, that was extracted by the jffs2dump
> python script (from git hub). It is interesting that it is also
> corrupt but in a different way.

Forgetting sqlite, can you compare the binary files?

JFFS2 creates each file from the log entries, each of which carry a
sequence number, and cover a given range of the file (not more than a
4KiB page).

There should never be any *holes* in the file, which are not covered by
any data node. Were there in your dump? That would imply that a data
node was lost (its CRC failed, perhaps, and wasn't caught in time to be
written out elsewhere).

--
David Woodhouse                            Open Source Technology Centre
[hidden email]                              Intel Corporation


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: analysis of a corrupt db

Adam DeVita
Thanks.
My co-worker that got the jffs2 dump of the  file system reports that
there is a bug in the python script, so my above post is a false
trail.

He analysed the raw JFFS2 data and found it is consistent in that all
the node header and data checksums are correct. There is no corruption
in system.db from the JFFS2 point of view. I can tell that the
corruption occurred during the f/w update on Dec 3 because all the
timestamps for the system.db nodes are during that time period. From
the f/w update log file, sysmgr was stopped at the start of the update
procedure along with everything else. I tried replicating what would
have happened by downgrading my unit to the same version of f/w that
was running at the time of the update, writing the raw JFFS2 to my
unit, and then applying the Dec 3 update. The update completely
successfully and system.db was ok.

There weren't any holes in the file, he wrote a program to analyze
this specifically.

He successfully simulated the above update three times but each time
the binary system.db was different (not identical). I'll get a copy
later to do a binary comparison. It may be that there are other system
events. I've asked that a test be run under sub-optimal power
conditions.  It seems unlikely this is an sqlite issue. It may be a
file system / hardware / harsh environment thing.


Adam

On Wed, Jan 13, 2016 at 7:00 PM, David Woodhouse <[hidden email]> wrote:

> On Tue, 2016-01-12 at 12:18 -0500, Adam Devita wrote:
>>
>> A co-worker managed to get an copy of the db by as interpreted by
>> jffs2dump of the file system, that was extracted by the jffs2dump
>> python script (from git hub). It is interesting that it is also
>> corrupt but in a different way.
>
> Forgetting sqlite, can you compare the binary files?
>
> JFFS2 creates each file from the log entries, each of which carry a
> sequence number, and cover a given range of the file (not more than a
> 4KiB page).
>
> There should never be any *holes* in the file, which are not covered by
> any data node. Were there in your dump? That would imply that a data
> node was lost (its CRC failed, perhaps, and wasn't caught in time to be
> written out elsewhere).
>
> --
> David Woodhouse                            Open Source Technology Centre
> [hidden email]                              Intel Corporation
>



--
--------------
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users