Strange Corruption Issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange Corruption Issue

poncho524
I'm using sqlite in an embedded application, running on SSD.

journal_mode=persist
so that it is more resilient to loss of power.

I'm seeing corruption.  I'm using sqlite to log events on the system,
and the corruption is well in the middle of a power session; not at
the tail end of log when a power loss might occur.

What i'm seeing is just a few pages corrupted with random bits being
flipped.  looking in a hex editor I can see the corrupted data, and
where I can tell what values it SHOULD be, I see that they're wrong,
but only by a single bit flip.... in random bytes here and there.  for
example a "A" is "a", or a "E" is "A".  These are all changes of a
single bit.  there are far more examples... but in pretty much every
case (even when RowID's are wrong) its just off by a bit.

I'm using sqlite 3.7 (i know, old, but this this system is old).  Has
anyone else seen random bit flips?  Any idea what could be causing it?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange Corruption Issue

Scott Doctor
SSD's have a limited number of write cycles. You may have a
failing SSD. Those are still, IMO, another 5-10 years before
they solve the write lifetime reliabilty issue.

-------------------------
Scott Doctor
[hidden email]
-------------------------

On 6/18/2018 20:15, Patrick Herbst wrote:

> I'm using sqlite in an embedded application, running on SSD.
>
> journal_mode=persist
> so that it is more resilient to loss of power.
>
> I'm seeing corruption.  I'm using sqlite to log events on the system,
> and the corruption is well in the middle of a power session; not at
> the tail end of log when a power loss might occur.
>
> What i'm seeing is just a few pages corrupted with random bits being
> flipped.  looking in a hex editor I can see the corrupted data, and
> where I can tell what values it SHOULD be, I see that they're wrong,
> but only by a single bit flip.... in random bytes here and there.  for
> example a "A" is "a", or a "E" is "A".  These are all changes of a
> single bit.  there are far more examples... but in pretty much every
> case (even when RowID's are wrong) its just off by a bit.
>
> I'm using sqlite 3.7 (i know, old, but this this system is old).  Has
> anyone else seen random bit flips?  Any idea what could be causing it?
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange Corruption Issue

Keith Medcalf

The new "consumer" SSDs from Samsung carry a 1200 TBW/8 year warranty on a 4 TB device.  That is a lot of writing for a "consumer desktop" computer ... that is about 400 GB written per DAY every day for 8 years!

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Scott Doctor
>Sent: Monday, 18 June, 2018 22:27
>To: [hidden email]
>Subject: Re: [sqlite] Strange Corruption Issue
>
>SSD's have a limited number of write cycles. You may have a
>failing SSD. Those are still, IMO, another 5-10 years before
>they solve the write lifetime reliabilty issue.
>
>-------------------------
>Scott Doctor
>[hidden email]
>-------------------------
>
>On 6/18/2018 20:15, Patrick Herbst wrote:
>> I'm using sqlite in an embedded application, running on SSD.
>>
>> journal_mode=persist
>> so that it is more resilient to loss of power.
>>
>> I'm seeing corruption.  I'm using sqlite to log events on the
>system,
>> and the corruption is well in the middle of a power session; not at
>> the tail end of log when a power loss might occur.
>>
>> What i'm seeing is just a few pages corrupted with random bits
>being
>> flipped.  looking in a hex editor I can see the corrupted data, and
>> where I can tell what values it SHOULD be, I see that they're
>wrong,
>> but only by a single bit flip.... in random bytes here and there.
>for
>> example a "A" is "a", or a "E" is "A".  These are all changes of a
>> single bit.  there are far more examples... but in pretty much
>every
>> case (even when RowID's are wrong) its just off by a bit.
>>
>> I'm using sqlite 3.7 (i know, old, but this this system is old).
>Has
>> anyone else seen random bit flips?  Any idea what could be causing
>it?
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange Corruption Issue

Rowan Worth-2
Between updates, automatic maintenance, registry churn, event logs, and
background "optimisations" I reckon windows could give 400G/day a run for
its money :P

-Rowan

On 19 June 2018 at 12:37, Keith Medcalf <[hidden email]> wrote:

>
> The new "consumer" SSDs from Samsung carry a 1200 TBW/8 year warranty on a
> 4 TB device.  That is a lot of writing for a "consumer desktop" computer
> ... that is about 400 GB written per DAY every day for 8 years!
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >[hidden email]] On Behalf Of Scott Doctor
> >Sent: Monday, 18 June, 2018 22:27
> >To: [hidden email]
> >Subject: Re: [sqlite] Strange Corruption Issue
> >
> >SSD's have a limited number of write cycles. You may have a
> >failing SSD. Those are still, IMO, another 5-10 years before
> >they solve the write lifetime reliabilty issue.
> >
> >-------------------------
> >Scott Doctor
> >[hidden email]
> >-------------------------
> >
> >On 6/18/2018 20:15, Patrick Herbst wrote:
> >> I'm using sqlite in an embedded application, running on SSD.
> >>
> >> journal_mode=persist
> >> so that it is more resilient to loss of power.
> >>
> >> I'm seeing corruption.  I'm using sqlite to log events on the
> >system,
> >> and the corruption is well in the middle of a power session; not at
> >> the tail end of log when a power loss might occur.
> >>
> >> What i'm seeing is just a few pages corrupted with random bits
> >being
> >> flipped.  looking in a hex editor I can see the corrupted data, and
> >> where I can tell what values it SHOULD be, I see that they're
> >wrong,
> >> but only by a single bit flip.... in random bytes here and there.
> >for
> >> example a "A" is "a", or a "E" is "A".  These are all changes of a
> >> single bit.  there are far more examples... but in pretty much
> >every
> >> case (even when RowID's are wrong) its just off by a bit.
> >>
> >> I'm using sqlite 3.7 (i know, old, but this this system is old).
> >Has
> >> anyone else seen random bit flips?  Any idea what could be causing
> >it?
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> >users
> >
> >_______________________________________________
> >sqlite-users mailing list
> >[hidden email]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange Corruption Issue

Scott Robison-2
In reply to this post by poncho524
On Mon, Jun 18, 2018 at 9:15 PM, Patrick Herbst <[hidden email]> wrote:

> I'm using sqlite in an embedded application, running on SSD.
>
> journal_mode=persist
> so that it is more resilient to loss of power.
>
> I'm seeing corruption.  I'm using sqlite to log events on the system,
> and the corruption is well in the middle of a power session; not at
> the tail end of log when a power loss might occur.
>
> What i'm seeing is just a few pages corrupted with random bits being
> flipped.  looking in a hex editor I can see the corrupted data, and
> where I can tell what values it SHOULD be, I see that they're wrong,
> but only by a single bit flip.... in random bytes here and there.  for
> example a "A" is "a", or a "E" is "A".  These are all changes of a
> single bit.  there are far more examples... but in pretty much every
> case (even when RowID's are wrong) its just off by a bit.
>
> I'm using sqlite 3.7 (i know, old, but this this system is old).  Has
> anyone else seen random bit flips?  Any idea what could be causing it?

My first guess would be failing RAM chips.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange Corruption Issue

Keith Medcalf
In reply to this post by Rowan Worth-2

After almost a year I am at 18 TBW on my system/data SSD and that includes several re-installs of Windows plus a bunch of VM updates of the Windows "Insider" previews, so getting to 1200 TBW would be quite a task ... (Note, defrag does not run on SSDs since it is useless, but I have forced that more than once as just to see how it worked).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Rowan Worth
>Sent: Monday, 18 June, 2018 22:42
>To: SQLite mailing list
>Subject: Re: [sqlite] Strange Corruption Issue
>
>Between updates, automatic maintenance, registry churn, event logs,
>and
>background "optimisations" I reckon windows could give 400G/day a run
>for
>its money :P
>
>-Rowan
>
>On 19 June 2018 at 12:37, Keith Medcalf <[hidden email]> wrote:
>
>>
>> The new "consumer" SSDs from Samsung carry a 1200 TBW/8 year
>warranty on a
>> 4 TB device.  That is a lot of writing for a "consumer desktop"
>computer
>> ... that is about 400 GB written per DAY every day for 8 years!
>>
>> ---
>> The fact that there's a Highway to Hell but only a Stairway to
>Heaven says
>> a lot about anticipated traffic volume.
>>
>>
>> >-----Original Message-----
>> >From: sqlite-users [mailto:sqlite-users-
>> >[hidden email]] On Behalf Of Scott Doctor
>> >Sent: Monday, 18 June, 2018 22:27
>> >To: [hidden email]
>> >Subject: Re: [sqlite] Strange Corruption Issue
>> >
>> >SSD's have a limited number of write cycles. You may have a
>> >failing SSD. Those are still, IMO, another 5-10 years before
>> >they solve the write lifetime reliabilty issue.
>> >
>> >-------------------------
>> >Scott Doctor
>> >[hidden email]
>> >-------------------------
>> >
>> >On 6/18/2018 20:15, Patrick Herbst wrote:
>> >> I'm using sqlite in an embedded application, running on SSD.
>> >>
>> >> journal_mode=persist
>> >> so that it is more resilient to loss of power.
>> >>
>> >> I'm seeing corruption.  I'm using sqlite to log events on the
>> >system,
>> >> and the corruption is well in the middle of a power session; not
>at
>> >> the tail end of log when a power loss might occur.
>> >>
>> >> What i'm seeing is just a few pages corrupted with random bits
>> >being
>> >> flipped.  looking in a hex editor I can see the corrupted data,
>and
>> >> where I can tell what values it SHOULD be, I see that they're
>> >wrong,
>> >> but only by a single bit flip.... in random bytes here and
>there.
>> >for
>> >> example a "A" is "a", or a "E" is "A".  These are all changes of
>a
>> >> single bit.  there are far more examples... but in pretty much
>> >every
>> >> case (even when RowID's are wrong) its just off by a bit.
>> >>
>> >> I'm using sqlite 3.7 (i know, old, but this this system is old).
>> >Has
>> >> anyone else seen random bit flips?  Any idea what could be
>causing
>> >it?
>> >> _______________________________________________
>> >> sqlite-users mailing list
>> >> [hidden email]
>> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>> >users
>> >
>> >_______________________________________________
>> >sqlite-users mailing list
>> >[hidden email]
>> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>>
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange Corruption Issue

Simon Slavin-3
In reply to this post by Scott Robison-2


On 19 Jun 2018, at 5:44am, Scott Robison <[hidden email]> wrote:

> My first guess would be failing RAM chips.

My guess is that, but more vague.

I think you have a hardware problem of some kind.  Whether it's main storage, motherboard, memory or something else I don't know but flipping one random bit in an octet looks like a hardware problem, not a software problem.

Can you try the same software out on another computer ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Strange Corruption Issue

poncho524
Simon Slavin-3 wrote
> I think you have a hardware problem of some kind.  Whether it's main
> storage, motherboard, memory or something else I don't know but flipping
> one random bit in an octet looks like a hardware problem, not a software
> problem.
>
> Can you try the same software out on another computer ?

I have seen corruption on other boxes.  Trouble is these are fairly
unattended boxes and only notice corruption when i go to check the logs.
And sometimes the file are fine.  Just every once in a while it goes bad.
Maybe the drives we use aren't good.  Who knows.  I also haven't done a deep
analysis of other corrupt files to see if they also show the single bit flip
errors, or if they were corrupt for other reasons.  I'll report if we find
anything interesting.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users