Header corruption

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Header corruption

deon
I’m trying to track down SQLITE corruptions that seems to corrupt our databases in a consistent way. (Running on 3.20.1).

This isn’t related to a crash or hardware failure. The app is running and reading and writing to the database fine, and then suddenly we start getting a SQLITE_NOTADB errors. The first error we got was on trying to execute a ‘BEGIN TRANSACTION’, and everything thereafter started failing.

When our database headers are fine, I see this:
00000000h: 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 ; SQLite format 3.
00000010h: 10 00 02 02 00 40 20 20 00 00 10 9B 00 01 34 42 ; .....@  ...›..4B

On corruption #1, I see this:
00000000h: 15 03 01 00 20 01 91 21 85 44 17 2C ED BE 21 FA ; .... .‘!…D.,í¾!ú
00000010h: 9E 74 E7 EA 9A 0A 15 2C 99 B0 B7 3B C6 F0 35 FB ; žtçêš..,™°·;Æð5û

On corruption #2, I see this:
00000000h: 15 03 01 00 20 0F 1E CB B1 FF 9C 1E D0 D6 BB 22 ; .... ..˱ÿœ.ÐÖ»"
00000010h: 15 64 D3 F4 DD 38 FB DF A3 E0 47 B8 D7 F6 21 BC ; .dÓôÝ8ûߣàG¸×ö!¼


This bit pattern of “15 03 01 00” for the first 4 characters where it should be “SQLi” is a bit too much of a coincidence to ignore or write off as disk error.

The rest of the header and file after the first 25-ish bytes or so, seem fine. It’s just these first 25  bytes that gets corrupted. If I restore Offsets 0 to Offsets 27 out of a “working” database, I get a usable file.

Under what circumstances would SQLITE even attempt to re-write the header string?

- Deon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

David Raymond
Things stored in the first 25 bytes include page size, WAL status, and the file change counter. So at least part of the header there gets changed with every committed write transaction.

http://www.sqlite.org/fileformat2.html


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Deon Brewis
Sent: Tuesday, February 06, 2018 11:57 AM
To: SQLite mailing list
Subject: [sqlite] Header corruption

I’m trying to track down SQLITE corruptions that seems to corrupt our databases in a consistent way. (Running on 3.20.1).

This isn’t related to a crash or hardware failure. The app is running and reading and writing to the database fine, and then suddenly we start getting a SQLITE_NOTADB errors. The first error we got was on trying to execute a ‘BEGIN TRANSACTION’, and everything thereafter started failing.

When our database headers are fine, I see this:
00000000h: 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 ; SQLite format 3.
00000010h: 10 00 02 02 00 40 20 20 00 00 10 9B 00 01 34 42 ; .....@  ...›..4B

On corruption #1, I see this:
00000000h: 15 03 01 00 20 01 91 21 85 44 17 2C ED BE 21 FA ; .... .‘!…D.,í¾!ú
00000010h: 9E 74 E7 EA 9A 0A 15 2C 99 B0 B7 3B C6 F0 35 FB ; žtçêš..,™°·;Æð5û

On corruption #2, I see this:
00000000h: 15 03 01 00 20 0F 1E CB B1 FF 9C 1E D0 D6 BB 22 ; .... ..˱ÿœ.ÐÖ»"
00000010h: 15 64 D3 F4 DD 38 FB DF A3 E0 47 B8 D7 F6 21 BC ; .dÓôÝ8ûߣàG¸×ö!¼


This bit pattern of “15 03 01 00” for the first 4 characters where it should be “SQLi” is a bit too much of a coincidence to ignore or write off as disk error.

The rest of the header and file after the first 25-ish bytes or so, seem fine. It’s just these first 25  bytes that gets corrupted. If I restore Offsets 0 to Offsets 27 out of a “working” database, I get a usable file.

Under what circumstances would SQLITE even attempt to re-write the header string?

- Deon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Dan Kennedy-4
In reply to this post by deon
On 02/06/2018 11:57 PM, Deon Brewis wrote:

> I’m trying to track down SQLITE corruptions that seems to corrupt our databases in a consistent way. (Running on 3.20.1).
>
> This isn’t related to a crash or hardware failure. The app is running and reading and writing to the database fine, and then suddenly we start getting a SQLITE_NOTADB errors. The first error we got was on trying to execute a ‘BEGIN TRANSACTION’, and everything thereafter started failing.
>
> When our database headers are fine, I see this:
> 00000000h: 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 ; SQLite format 3.
> 00000010h: 10 00 02 02 00 40 20 20 00 00 10 9B 00 01 34 42 ; .....@  ...›..4B
>
> On corruption #1, I see this:
> 00000000h: 15 03 01 00 20 01 91 21 85 44 17 2C ED BE 21 FA ; .... .‘!…D.,í¾!ú
> 00000010h: 9E 74 E7 EA 9A 0A 15 2C 99 B0 B7 3B C6 F0 35 FB ; žtçêš..,™°·;Æð5û
>
> On corruption #2, I see this:
> 00000000h: 15 03 01 00 20 0F 1E CB B1 FF 9C 1E D0 D6 BB 22 ; .... ..˱ÿœ.ÐÖ»"
> 00000010h: 15 64 D3 F4 DD 38 FB DF A3 E0 47 B8 D7 F6 21 BC ; .dÓôÝ8ûߣàG¸×ö!¼
>
>
> This bit pattern of “15 03 01 00” for the first 4 characters where it should be “SQLi” is a bit too much of a coincidence to ignore or write off as disk error.
>
> The rest of the header and file after the first 25-ish bytes or so, seem fine. It’s just these first 25  bytes that gets corrupted. If I restore Offsets 0 to Offsets 27 out of a “working” database, I get a usable file.
>
> Under what circumstances would SQLITE even attempt to re-write the header string?

Is this on UNIX?

Quite often this sort of thing occurs because some other module is
writing to a file-descriptor owned by SQLite. Usually because it closed
its own fd, then SQLite opened the db file and was assigned the same
integer fd value, then the rogue module wrote to the fd anyway. In other
words, some other module is doing:

   close(fd);
   write(fd, "1503010020...", 25);

and between those two calls SQLite is calling open() and is being
assigned a file-descriptor with the same integer value as fd.

Dan.





>
> - Deon
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Deon Brewis
That's an interesting thought, but we see this on Windows as well, and on Windows these files are opened with SQLITE_OPEN_EXCLUSIVE.

So this looks more like something is overwriting the memory of Page1 before SQLITE writes it back to disk.

I don't think SQLITE verifies the memory during every write. There's a:
if( memcmp(page1, zMagicHeader, 16)!=0 ){

during lockBtree() but that seems to be on read, not write.


Is there somewhere I can add a check of the header buffer before the write to disk?

Or does SQLITE literally never write this part of the header again once written the first time? (Seek past the zMagicHeader?).

- Deon

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Dan Kennedy
Sent: Tuesday, February 6, 2018 9:08 AM
To: [hidden email]
Subject: Re: [sqlite] Header corruption

On 02/06/2018 11:57 PM, Deon Brewis wrote:

> I’m trying to track down SQLITE corruptions that seems to corrupt our databases in a consistent way. (Running on 3.20.1).
>
> This isn’t related to a crash or hardware failure. The app is running and reading and writing to the database fine, and then suddenly we start getting a SQLITE_NOTADB errors. The first error we got was on trying to execute a ‘BEGIN TRANSACTION’, and everything thereafter started failing.
>
> When our database headers are fine, I see this:
> 00000000h: 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 ; SQLite format 3.
> 00000010h: 10 00 02 02 00 40 20 20 00 00 10 9B 00 01 34 42 ; .....@  
> ...›..4B
>
> On corruption #1, I see this:
> 00000000h: 15 03 01 00 20 01 91 21 85 44 17 2C ED BE 21 FA ; ....
> .‘!…D.,í¾!ú
> 00000010h: 9E 74 E7 EA 9A 0A 15 2C 99 B0 B7 3B C6 F0 35 FB ;
> žtçêš..,™°·;Æð5û
>
> On corruption #2, I see this:
> 00000000h: 15 03 01 00 20 0F 1E CB B1 FF 9C 1E D0 D6 BB 22 ; .... ..˱ÿœ.ÐÖ»"
> 00000010h: 15 64 D3 F4 DD 38 FB DF A3 E0 47 B8 D7 F6 21 BC ;
> .dÓôÝ8ûߣàG¸×ö!¼
>
>
> This bit pattern of “15 03 01 00” for the first 4 characters where it should be “SQLi” is a bit too much of a coincidence to ignore or write off as disk error.
>
> The rest of the header and file after the first 25-ish bytes or so, seem fine. It’s just these first 25  bytes that gets corrupted. If I restore Offsets 0 to Offsets 27 out of a “working” database, I get a usable file.
>
> Under what circumstances would SQLITE even attempt to re-write the header string?

Is this on UNIX?

Quite often this sort of thing occurs because some other module is writing to a file-descriptor owned by SQLite. Usually because it closed its own fd, then SQLite opened the db file and was assigned the same integer fd value, then the rogue module wrote to the fd anyway. In other words, some other module is doing:

   close(fd);
   write(fd, "1503010020...", 25);

and between those two calls SQLite is calling open() and is being assigned a file-descriptor with the same integer value as fd.

Dan.





>
> - Deon
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmaili
> nglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=
> 02%7C01%7C%7C12190af418bc4a32192b08d56d8432f3%7C84df9e7fe9f640afb435aa
> aaaaaaaaaa%7C1%7C0%7C636535336909688339&sdata=OMCv%2BtfSJ4PvlFTFw8Cqcc
> 1eizCrnIgFnNSJ7x5XUoo%3D&reserved=0


_______________________________________________
sqlite-users mailing list
[hidden email]
https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7C%7C12190af418bc4a32192b08d56d8432f3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636535336909688339&sdata=OMCv%2BtfSJ4PvlFTFw8Cqcc1eizCrnIgFnNSJ7x5XUoo%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Richard Hipp-3
On 2/7/18, Deon Brewis <[hidden email]> wrote:
>
> I don't think SQLITE verifies the memory during every write. There's a:
> if( memcmp(page1, zMagicHeader, 16)!=0 ){
>
> during lockBtree() but that seems to be on read, not write.
>
>
> Is there somewhere I can add a check of the header buffer before the write
> to disk?

There does not appear to be a single place to put this code as the
header can be written from several places.

Search for instances of "sqlite3OsWrite(pPager->fd, ...)".  You want
to be sure that the last field is "fd" and not "jfd" or "sjfd".  I
think there are four separate places you will need to insert your
patch.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Dan Kennedy-4
In reply to this post by Deon Brewis
On 02/07/2018 10:16 PM, Deon Brewis wrote:

> That's an interesting thought, but we see this on Windows as well, and on Windows these files are opened with SQLITE_OPEN_EXCLUSIVE.
>
> So this looks more like something is overwriting the memory of Page1 before SQLITE writes it back to disk.
>
> I don't think SQLITE verifies the memory during every write. There's a:
> if( memcmp(page1, zMagicHeader, 16)!=0 ){
>
> during lockBtree() but that seems to be on read, not write.
>
>
> Is there somewhere I can add a check of the header buffer before the write to disk?
>
> Or does SQLITE literally never write this part of the header again once written the first time? (Seek past the zMagicHeader?).

If it does rewrite it, it writes all of the first database page at the
same time.

You're using wal mode, so whenever page 1 (the page containing the db
header) is written, it is appended to the wal file. Then, later on, a
checkpoint copies it from the wal file into the db file.

Writing the page to the wal file happens here:

   http://www.sqlite.org/src/artifact/5a3f464edd645?ln=3404

You could add (say):

   assert( p->pgno!=1 || 0==memcmp(pData, "SQLite", 6) );

Then copying from the wal file into the db file is here:

   http://www.sqlite.org/src/artifact/5a3f464edd645?ln=1861

Add (again, say):

   assert( iDbpage!=1 || 0==memcmp(zBuf, "SQLite", 6) );

Dan.




>
> - Deon
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]] On Behalf Of Dan Kennedy
> Sent: Tuesday, February 6, 2018 9:08 AM
> To: [hidden email]
> Subject: Re: [sqlite] Header corruption
>
> On 02/06/2018 11:57 PM, Deon Brewis wrote:
>> I’m trying to track down SQLITE corruptions that seems to corrupt our databases in a consistent way. (Running on 3.20.1).
>>
>> This isn’t related to a crash or hardware failure. The app is running and reading and writing to the database fine, and then suddenly we start getting a SQLITE_NOTADB errors. The first error we got was on trying to execute a ‘BEGIN TRANSACTION’, and everything thereafter started failing.
>>
>> When our database headers are fine, I see this:
>> 00000000h: 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 ; SQLite format 3.
>> 00000010h: 10 00 02 02 00 40 20 20 00 00 10 9B 00 01 34 42 ; .....@
>> ...›..4B
>>
>> On corruption #1, I see this:
>> 00000000h: 15 03 01 00 20 01 91 21 85 44 17 2C ED BE 21 FA ; ....
>> .‘!…D.,í¾!ú
>> 00000010h: 9E 74 E7 EA 9A 0A 15 2C 99 B0 B7 3B C6 F0 35 FB ;
>> žtçêš..,™°·;Æð5û
>>
>> On corruption #2, I see this:
>> 00000000h: 15 03 01 00 20 0F 1E CB B1 FF 9C 1E D0 D6 BB 22 ; .... ..˱ÿœ.ÐÖ»"
>> 00000010h: 15 64 D3 F4 DD 38 FB DF A3 E0 47 B8 D7 F6 21 BC ;
>> .dÓôÝ8ûߣàG¸×ö!¼
>>
>>
>> This bit pattern of “15 03 01 00” for the first 4 characters where it should be “SQLi” is a bit too much of a coincidence to ignore or write off as disk error.
>>
>> The rest of the header and file after the first 25-ish bytes or so, seem fine. It’s just these first 25  bytes that gets corrupted. If I restore Offsets 0 to Offsets 27 out of a “working” database, I get a usable file.
>>
>> Under what circumstances would SQLITE even attempt to re-write the header string?
> Is this on UNIX?
>
> Quite often this sort of thing occurs because some other module is writing to a file-descriptor owned by SQLite. Usually because it closed its own fd, then SQLite opened the db file and was assigned the same integer fd value, then the rogue module wrote to the fd anyway. In other words, some other module is doing:
>
>     close(fd);
>     write(fd, "1503010020...", 25);
>
> and between those two calls SQLite is calling open() and is being assigned a file-descriptor with the same integer value as fd.
>
> Dan.
>
>
>
>
>
>> - Deon
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmaili
>> nglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=
>> 02%7C01%7C%7C12190af418bc4a32192b08d56d8432f3%7C84df9e7fe9f640afb435aa
>> aaaaaaaaaa%7C1%7C0%7C636535336909688339&sdata=OMCv%2BtfSJ4PvlFTFw8Cqcc
>> 1eizCrnIgFnNSJ7x5XUoo%3D&reserved=0
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7C%7C12190af418bc4a32192b08d56d8432f3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636535336909688339&sdata=OMCv%2BtfSJ4PvlFTFw8Cqcc1eizCrnIgFnNSJ7x5XUoo%3D&reserved=0
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Simon Slavin-3
In reply to this post by Deon Brewis
On 7 Feb 2018, at 3:16pm, Deon Brewis <[hidden email]> wrote:

> So this looks more like something is overwriting the memory of Page1 before SQLITE writes it back to disk.

That is almost always what people eventually admit to after reporting a problem like this.  Some part of their code is stomping on memory or a file handle which SQLite thought it had exclusive rights to.  Some of them discover it using a runtime profiler tool which looks for use of released memory or double-release of file handles, but I don't know enough about Windows to suggest anything.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Deon Brewis
Oh yeah, I don’t think this is a SQLITE bug or anything.

I think something in our code is writing to memory after freed. I'm just trying to track it down at the point that it happens. We've tried all Profiling tools on both OSX and Windows without luck, so my next step is trying to find the writing thread at the point of corruption.

Dan Kennedy's suggestion seems like that would we that way to do that.

- Deon

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Wednesday, February 7, 2018 8:32 AM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Header corruption

On 7 Feb 2018, at 3:16pm, Deon Brewis <[hidden email]> wrote:

> So this looks more like something is overwriting the memory of Page1 before SQLITE writes it back to disk.

That is almost always what people eventually admit to after reporting a problem like this.  Some part of their code is stomping on memory or a file handle which SQLite thought it had exclusive rights to.  Some of them discover it using a runtime profiler tool which looks for use of released memory or double-release of file handles, but I don't know enough about Windows to suggest anything.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7C%7C883405b8374547bace6508d56e486bb8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636536179674888635&sdata=ZZtP6DzeVTBCnnWqxu0hWRMVLS6NWTFLrBI0%2Buu6G4I%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Eduardo
On Wed, 7 Feb 2018 17:29:54 +0000
Deon Brewis <[hidden email]> escribió:

> Oh yeah, I don’t think this is a SQLITE bug or anything.
>
> I think something in our code is writing to memory after freed. I'm just trying to track it down at the point that it happens. We've tried all Profiling tools on both OSX and Windows without luck, so my next step is trying to find the writing thread at the point of corruption.
>

Profiling shouldn't help, it isn't the rigth tool. Use Xcode analyzer or from
command line scan-build and scan-view. If you can, use the last version of
clang-analyzer.

In your project directory type:

%mkdir review
%scan-build make -o review/ -V

in directory review is the scan-build html output. Use:
 
%scan-view review/ to see it again


> Dan Kennedy's suggestion seems like that would we that way to do that.
>
> - Deon

--
Eduardo <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Jens Alfke-2


> On Feb 8, 2018, at 12:43 AM, Eduardo <[hidden email]> wrote:
>
> Profiling shouldn't help, it isn't the rigth tool. Use Xcode analyzer or from
> command line scan-build and scan-view. If you can, use the last version of
> clang-analyzer.

The Clang address sanitizer would be the best tool for diagnosing memory corruption. It's a lifesaver.

In Xcode it's very easy to enable: go to the scheme editor, select Run (or Test) from the list on the left, click the Diagnostics tab, and click the "Address Sanitizer" checkbox. I also recommend checking "Detect use of stack after return" and "Malloc Scribble". Then press the Run (or Test) button. I tend to leave this on all the time while developing; it slows down the program, but not enough to get in the way.

It can also be used from the command-line, but I have no knowledge of how to do that. (I do know it requires recompiling with a special compiler flag, since it instruments the machine code.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Deon Brewis
In reply to this post by deon
After months, I managed to track this down. I'd like to extend a BIG thanks to Richard Hip and Dan Kennedy for their help in helping me instrument and understand the SQLITE internal data structures better, as well as giving me a way to programmatically do this, as well as to teach me about the showdb tool. It has been invaluable to help me understand and categorize the corruption which in turn helped me to know what common pattern to look out for.

Turns out, in some shutdown scenarios we would call what ends up being this:

closesocket(_socket); // posix socket
SSL_shutdown(_ssl); // openssl (_ssl was initialized using the _socket above)


If in between those two commands, we opened a new SQLITE connection on another thread, SQLITE may get the same file handle value as what the old socket value used to be.

SSL_shutdown then sends out a sequence resembling the following, to what used to be the socket:
150301002071476f3be1f3fa76f22b9addbe0f520ebbe007fcc1d6536c19ec9d69c5334799

However, since the old socket handle value is now being used as a file handle value, the sequence ends up in the database file instead.

So this was a special case of re-using the File handle as per the corruption guide. One just has to be in the mindset that on unix based platforms, a socket is a file handle. (Not instinctive if you're coming from a Windows background).

Thanks again guys!


I'll leave the following search terms here as well in case someone searches for this in the future:
1503010020
15030100
352518400 (decimal version)
OpenSSL

- Deon

-----Original Message-----
From: sqlite-users <[hidden email]> On Behalf Of Deon Brewis
Sent: Tuesday, February 6, 2018 8:57 AM
To: SQLite mailing list <[hidden email]>
Subject: [sqlite] Header corruption

I’m trying to track down SQLITE corruptions that seems to corrupt our databases in a consistent way. (Running on 3.20.1).

This isn’t related to a crash or hardware failure. The app is running and reading and writing to the database fine, and then suddenly we start getting a SQLITE_NOTADB errors. The first error we got was on trying to execute a ‘BEGIN TRANSACTION’, and everything thereafter started failing.

When our database headers are fine, I see this:
00000000h: 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 ; SQLite format 3.
00000010h: 10 00 02 02 00 40 20 20 00 00 10 9B 00 01 34 42 ; .....@  ...›..4B

On corruption #1, I see this:
00000000h: 15 03 01 00 20 01 91 21 85 44 17 2C ED BE 21 FA ; .... .‘!…D.,í¾!ú
00000010h: 9E 74 E7 EA 9A 0A 15 2C 99 B0 B7 3B C6 F0 35 FB ; žtçêš..,™°·;Æð5û

On corruption #2, I see this:
00000000h: 15 03 01 00 20 0F 1E CB B1 FF 9C 1E D0 D6 BB 22 ; .... ..˱ÿœ.ÐÖ»"
00000010h: 15 64 D3 F4 DD 38 FB DF A3 E0 47 B8 D7 F6 21 BC ; .dÓôÝ8ûߣàG¸×ö!¼


This bit pattern of “15 03 01 00” for the first 4 characters where it should be “SQLi” is a bit too much of a coincidence to ignore or write off as disk error.

The rest of the header and file after the first 25-ish bytes or so, seem fine. It’s just these first 25  bytes that gets corrupted. If I restore Offsets 0 to Offsets 27 out of a “working” database, I get a usable file.

Under what circumstances would SQLITE even attempt to re-write the header string?

- Deon

_______________________________________________
sqlite-users mailing list
[hidden email]
https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7C%7Cfdf8fd84436048062e6c08d56d82ae47%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636535330385784749&sdata=qjk7S7H6FW%2FaZIBD2XKPULL4v1wH7p2UN4GaKkx1O7I%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Peter da Silva
In reply to this post by deon
On 4/17/18, 3:08 PM, "sqlite-users on behalf of Deon Brewis" <[hidden email] on behalf of [hidden email]> wrote:
> So this was a special case of re-using the File handle as per the corruption guide. One just has to be in the mindset that on unix based platforms, a socket is a file handle. (Not instinctive if you're coming from a Windows background).

Even on Windows, wouldn't doing an ssl_shutdown on a socket you'd already closed still have a risk of unexpected behavior?
 


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Simon Slavin-3
On 17 Apr 2018, at 9:13pm, Peter Da Silva <[hidden email]> wrote:

> Even on Windows, wouldn't doing an ssl_shutdown on a socket you'd already closed still have a risk of unexpected behavior?

Under Windows an attempting to shutdown a connection which is already shutdown returns SOCKET_ERROR .  You can then make further calls to learn that the exact problem was an attempt to close something that isn't an open socket.

The major problem is analogous to one with SQLite: most software doesn't check the return value because if the return value is not 0 there's nothing simple that can be done about it.  It's simpler, and 99% of the time perfectly okay, to just quit as if no error had occurred.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Warren Young
In reply to this post by Deon Brewis
On Apr 17, 2018, at 2:07 PM, Deon Brewis <[hidden email]> wrote:
>
> One just has to be in the mindset that on unix based platforms, a socket is a file handle. (Not instinctive if you're coming from a Windows background).

You’re either using Winsock 1.1 or are dragging forward obsolete knowledge from the DOS-based versions of Windows:

   https://msdn.microsoft.com/library/windows/desktop/ms740522.aspx

Unless your code accidentally works on Windows because of differing semantics, there was a risk of seeing this bug bite on Windows, too.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Olivier Mascia
In reply to this post by Deon Brewis
> Le 17 avr. 2018 à 22:07, Deon Brewis <[hidden email]> a écrit :
>
> closesocket(_socket); // posix socket
> SSL_shutdown(_ssl); // openssl (_ssl was initialized using the _socket above)

These two statements are inherently wrong, in this order. First you SSL_shutdown(), then you closesocket(). Not the reverse. And in Windows code it even is good citizenship to call shutdown() properly in between.  AFAIK it should also be on Linux, though I'm told it is not commonly seen.  But please check the respective OS'es SDK and OpenSSL documentation for the details.

Called in the right order there is no file descriptor re-use syndrome to fear, because the closesocket() rightfully comes as the very last step.

--
Best regards, Meilleures salutations, Met vriendelijke groeten,  
Olivier Mascia (from mobile device)


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Deon Brewis
In reply to this post by Peter da Silva
Yes, but the socket values differ by the thousands in Windows from handle values returned by CreateFile.

On MAC they don't differ at all - it immediately gets re-used.

- Deon

-----Original Message-----
From: sqlite-users <[hidden email]> On Behalf Of Peter Da Silva
Sent: Tuesday, April 17, 2018 1:13 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Header corruption

On 4/17/18, 3:08 PM, "sqlite-users on behalf of Deon Brewis" <[hidden email] on behalf of [hidden email]> wrote:
> So this was a special case of re-using the File handle as per the corruption guide. One just has to be in the mindset that on unix based platforms, a socket is a file handle. (Not instinctive if you're coming from a Windows background).

Even on Windows, wouldn't doing an ssl_shutdown on a socket you'd already closed still have a risk of unexpected behavior?
 


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Deon Brewis
In reply to this post by Warren Young
Fair enough. Agreed, just less of a chance (and we haven't seen it) because Windows seems to provide more spacing between these values.

- Deon

-----Original Message-----
From: sqlite-users <[hidden email]> On Behalf Of Warren Young
Sent: Tuesday, April 17, 2018 1:54 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Header corruption

On Apr 17, 2018, at 2:07 PM, Deon Brewis <[hidden email]> wrote:
>
> One just has to be in the mindset that on unix based platforms, a socket is a file handle. (Not instinctive if you're coming from a Windows background).

You’re either using Winsock 1.1 or are dragging forward obsolete knowledge from the DOS-based versions of Windows:

   https://msdn.microsoft.com/library/windows/desktop/ms740522.aspx

Unless your code accidentally works on Windows because of differing semantics, there was a risk of seeing this bug bite on Windows, too.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Header corruption

Deon Brewis
In reply to this post by Olivier Mascia
Right, that's the bug we found. The statements are in the reverse order.  It's not quite that simple as the code below, but it ends up happening in that order.

It's one of the most fundamental things a senior dev asks themselves on every call that that differentiates them from a junior dev - "What is the lifetime semantics of this thing I'm handing over / getting back?".

You don't even need the documentation for this one after looking at how OpenSSL gets the socket handle in the first place - the bug is obvious, I just wasn't aware this codepath even existed. (Big product, big team - you know how it goes).

- Deon

-----Original Message-----
From: sqlite-users <[hidden email]> On Behalf Of Olivier Mascia
Sent: Tuesday, April 17, 2018 2:28 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Header corruption

> Le 17 avr. 2018 à 22:07, Deon Brewis <[hidden email]> a écrit :
>
> closesocket(_socket); // posix socket
> SSL_shutdown(_ssl); // openssl (_ssl was initialized using the _socket
> above)

These two statements are inherently wrong, in this order. First you SSL_shutdown(), then you closesocket(). Not the reverse. And in Windows code it even is good citizenship to call shutdown() properly in between.  AFAIK it should also be on Linux, though I'm told it is not commonly seen.  But please check the respective OS'es SDK and OpenSSL documentation for the details.

Called in the right order there is no file descriptor re-use syndrome to fear, because the closesocket() rightfully comes as the very last step.

--
Best regards, Meilleures salutations, Met vriendelijke groeten, Olivier Mascia (from mobile device)


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users