bug: failure to write journal reported as "disk I/O error"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

bug: failure to write journal reported as "disk I/O error"

KRECKEL Richard (AREVA)
Remove the write permission of a SQLite database's journal file. Then, try write-accessing the database. The error reported is "disk I/O error". (This happened to me when two user tried to share a DB and had their umask set wrong.)



The error message reported by SQLite is inappropriate. A "permission denied" would be much better and guide the user towards fixing the problem (instead of scaring the hell out of the poor sysadmin who suspects a filesystem corruption might be going on.)



I'm using SQLite 3.19.3.



All my best,

    -rbk.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Jens Alfke-2


> On Sep 25, 2017, at 4:39 AM, KRECKEL Richard (AREVA) <[hidden email]> wrote:
>
> Remove the write permission of a SQLite database's journal file. Then, try write-accessing the database. The error reported is "disk I/O error". (This happened to me when two user tried to share a DB and had their umask set wrong.)

The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.

If you want more detailed info, use extended error codes by calling sqlite3_extended_result_codes() or sqlite3_extended_errcode(). Then you’ll get a more specific error; in your situation probably SQLITE_IOERR_ACCESS.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
On Sep 26, 2017, at 8:22 AM, Jens Alfke <[hidden email]> wrote:

> The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.

But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.

And, on UN*X, a write() call can return ENOSPC; a write() is an I/O operation, and "returns -1 with errno set to ENOSPC" is an error, but that presumably gets reported as SQLITE_FULL, not as SQLITE_IOERR.

Sadly, the name chosen for that error code

        1) suggests an "I/O error" in the sense of "a device reported an error trying to read or write it"

and

        2) is probably part of the API and thus unchangeable.

However, if SQLITE_IOERR is returned for *anything* other than, on UN*X, an EIO errno:

        1) The documentation should *really really really really really* avoid calling it an "I/O error", as "I/O error" has a connotation of "the device reported an error" (which is what EIO signifies) rather than "an I/O operation got some sort of error, not necessarily an error from the device from which we were trying to read data or to which we were trying to write data".

        2) The documentation should tell people *always* to use sqlite3_system_errno() after an SQLITE_IOERR and report the error based on *that*, not just by reporting an "I/O error".  Yes, that means writing platform-dependent code; if you want to allow platform-independent code to be written atop SQLite, stuff the platform dependency inside SQLite, by providing some API to get errors such as, for example, "permission denied" or "disk quota exceeded" or "an actual disk I/O error occurred" rather than "write() got some error other than ENOSPC".  (Yes, you *can* get "permission denied", e.g. in an NFSv2/NFSv3 write to a file to which you had write permission when you opened it but to which you no longer have write permission, and, yes, if, for example, you're in the remote file system group at Apple, with a home directory on an NFS server, you can have an SQLite database being accessed over NFS.)

> If you want more detailed info, use extended error codes by calling sqlite3_extended_result_codes() or sqlite3_extended_errcode(). Then you’ll get a more specific error; in your situation probably SQLITE_IOERR_ACCESS.

Perhaps, in that particular code path, the permission problem would show up in an xAccess method call, so that this would happen to be able to give you a better error.

However, what matters isn't "what operation got the error?", it's "what non-file-system-full error did you get?", and the extended error code won't help for errors other than ENOSPC and EIO returned by write().
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Simon Slavin-3


On 26 Sep 2017, at 8:47pm, Guy Harris <[hidden email]> wrote:

> On Sep 26, 2017, at 8:22 AM, Jens Alfke <[hidden email]> wrote:
>
>> The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.
>
> But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.

Those error codes were devised in a day when OS error codes were more simple.  Also please note that those error codes are addressed to programmers.  Your users should never see the text explanation of the number.  Because your users wouldn’t know what to do about them. At most the user can be shown the number returned to they can quote it in a support call.

Can you find out which extended result code is returned ?

<https://www.sqlite.org/c3ref/extended_result_codes.html>

<https://www.sqlite.org/c3ref/c_abort_rollback.html>

That will let us know what’s really going on.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
On Sep 26, 2017, at 1:05 PM, Simon Slavin <[hidden email]> wrote:

> On 26 Sep 2017, at 8:47pm, Guy Harris <[hidden email]> wrote:
>
>> On Sep 26, 2017, at 8:22 AM, Jens Alfke <[hidden email]> wrote:
>>
>>> The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.
>>
>> But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.
>
> Those error codes were devised in a day when OS error codes were more simple.

EDQUOT was introduced in 1982, with 4.2BSD; when was SQLITE_IOERR devised?

> Also please note that those error codes are addressed to programmers.  Your users should never see the text explanation of the number.  Because your users wouldn’t know what to do about them.

A user wouldn't know what to do with "you've exceeded your stored data quota"?  If so, your site has failed to explain to the users that they've been given a quota, limiting the amount of space on the server that they can use, and that if they exceed their quota, they either need to delete stuff they no longer need, move stuff they might *someday* need but don't need *now* to some archival medium, or ask their system administrator to increase their quota?

> At most the user can be shown the number returned to they can quote it in a support call.

The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"?  (No cheating by looking it up in a man page or include file!)

And, yes, there needs to be *some* way to get the underlying problem reported to somebody in a position to do something about it - where "the underlying problem" includes "what did the OS say?" as much as it includes "what SQLite operation got the error?".
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Jens Alfke-2


> On Sep 26, 2017, at 1:17 PM, Guy Harris <[hidden email]> wrote:
>
> A user wouldn't know what to do with "you've exceeded your stored data quota”?

A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.) And there are plenty of messages that are much less understandable to a lay user than the one you picked out.

> The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"?  (No cheating by looking it up in a man page or include file!)

On the contrary, error numbers are a lot easier for support. They’re independent of locale, they don’t get re-worded from one version of the app to the next, and they’re very short and easy to dictate over the phone. Of course, these shouldn’t be the primary error information given to the user! But the user-level error message should be something specific to the application, like “an unexpected database error occurred (19)” instead of "Abort due to constraint violation”. The number would appear only for support purposes.

I say this as someone who’s worked on a number of end-user GUI applications over the years.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Simon Slavin-3
In reply to this post by Guy Harris


On 26 Sep 2017, at 9:17pm, Guy Harris <[hidden email]> wrote:

> The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"?  (No cheating by looking it up in a man page or include file!)

My support staff are allowed to look things up.

My users, when faced with a result which means "permission error" will probably grant all permissions to all apps and all users because that’s the simplest way to make a permission error message go away.  My users don’t understand the Posix permission model, because they’re not computer experts, they are financial sector specialists, or psychologists, or tailors.  I don’t want them thinking about computer problems.  If they knew enough about computer problems to fix a permission problem the right way, they wouldn’t be paying me.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
In reply to this post by Jens Alfke-2
On Sep 26, 2017, at 1:37 PM, Jens Alfke <[hidden email]> wrote:

>> On Sep 26, 2017, at 1:17 PM, Guy Harris <[hidden email]> wrote:
>>
>> A user wouldn't know what to do with "you've exceeded your stored data quota”?
>
> A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.)

Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".

And none of this argues against presenting to the user, in their native language, a message saying "you've exceeded your file system quota", if that is, in fact, what happened.

> And there are plenty of messages that are much less understandable to a lay user than the one you picked out.

"I got a permission error trying to write to the journal" isn't something you'd directly say to the lay user, but *don't* tell the user anything that might convince them that their disk is failing if you didn't get EIO or the equivalent on some other OS - and don't tell them something that, when relayed to tech support, would lead the support person to believe that, either.

I.e., Richard Krekel is 100% correct when he says that "disk I/O error" is an inappropriate message for a permission error - the *disk* had no problem, the *OS* had a problem when the disk returned file system data that, among other things, indicated that the user didn't have permission to do something.  Replacing the disk and restoring from a backup probably won't fix that problem (unless the user had that permission when the backup was done).

>> The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"?  (No cheating by looking it up in a man page or include file!)
>
> On the contrary, error numbers are a lot easier for support. They’re independent of locale,

But the error reported by sqlite3_system_errno() isn't independent of the OS on which the user is running, so *that* error wouldn't be easy for support.  You'd need a platform-independent error code, meaning, in this case, one supplied by SQLite, not by the OS.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Scott Robison-2
In reply to this post by Guy Harris
There are physical errors and there are logical errors. If an error is
generated from write, it's not unreasonable to classify it as an
"output error". From read as an "input error".

There is a lot of sqlite source code that already exists and has been
written to work with the current interface. That's probably one of the
reasons why extended errors were created, to provide finer
granularity. Regardless of whether it is ideal or not, changing sqlite
in a way that would break existing code is unlikely to happen.

Ultimately it doesn't matter when error codes were added to a given
operating system or which predates what. A decision was made in the
past. The options are to live with decisions that were made in the
past (one I've seen espoused multiple times in this mailing list),
come up with an approach that allows old code to work but exposes new
information (probably the genesis of extended error codes), or break
older code (which I've not seen done deliberately).

I'm not trying to tell you that your point is invalid. It makes sense
in many ways. Short of a time machine I doubt anything will change
(though those decisions are above my pay grade).

That being said, I don't know any non-technical users who are going to
panic that IOERR means their hard drive is dying specifically because
of that text being displayed. Panic perhaps, but not that a hard drive
is about to die. Most people I know don't have that level of
understanding to correlate IO / ERR / hard drive failure rates. They
just think the stupid program is broken and not letting them get their
work done. As for the experienced technical people I know (or at least
me), their first thought would be to investigate the problem, not to
assume their hard drive is failing.


On Tue, Sep 26, 2017 at 2:17 PM, Guy Harris <[hidden email]> wrote:

> On Sep 26, 2017, at 1:05 PM, Simon Slavin <[hidden email]> wrote:
>
>> On 26 Sep 2017, at 8:47pm, Guy Harris <[hidden email]> wrote:
>>
>>> On Sep 26, 2017, at 8:22 AM, Jens Alfke <[hidden email]> wrote:
>>>
>>>> The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.
>>>
>>> But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.
>>
>> Those error codes were devised in a day when OS error codes were more simple.
>
> EDQUOT was introduced in 1982, with 4.2BSD; when was SQLITE_IOERR devised?
>
>> Also please note that those error codes are addressed to programmers.  Your users should never see the text explanation of the number.  Because your users wouldn’t know what to do about them.
>
> A user wouldn't know what to do with "you've exceeded your stored data quota"?  If so, your site has failed to explain to the users that they've been given a quota, limiting the amount of space on the server that they can use, and that if they exceed their quota, they either need to delete stuff they no longer need, move stuff they might *someday* need but don't need *now* to some archival medium, or ask their system administrator to increase their quota?
>
>> At most the user can be shown the number returned to they can quote it in a support call.
>
> The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"?  (No cheating by looking it up in a man page or include file!)
>
> And, yes, there needs to be *some* way to get the underlying problem reported to somebody in a position to do something about it - where "the underlying problem" includes "what did the OS say?" as much as it includes "what SQLite operation got the error?".
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
Scott Robison
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Simon Slavin-3
In reply to this post by Guy Harris


On 26 Sep 2017, at 9:57pm, Guy Harris <[hidden email]> wrote:

> On Sep 26, 2017, at 1:37 PM, Jens Alfke <[hidden email]> wrote:
>
>>> On Sep 26, 2017, at 1:17 PM, Guy Harris <[hidden email]> wrote:
>>>
>>> A user wouldn't know what to do with "you've exceeded your stored data quota”?
>>
>> A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.)
>
> Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".

No.  It means that you should present /your/ error messages to your users, not error messages generated by SQLite.  SQLite is a programmer’s tool.  Its users are programmers, and that’s who its error messages are addressed to.  You should not be letting your users see error message intended for you, and you should not be making your users worry about what to do about them.

If your software wants to react to a SQLite result code by presenting one of its own error messages to its users, that’s fine.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
In reply to this post by Simon Slavin-3
On Sep 26, 2017, at 1:43 PM, Simon Slavin <[hidden email]> wrote:

> On 26 Sep 2017, at 9:17pm, Guy Harris <[hidden email]> wrote:
>
>> The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"?  (No cheating by looking it up in a man page or include file!)
>
> My support staff are allowed to look things up.

Just don't force them to ask, *before* the look it up, whether the user's running Linux or macOS or FreeBSD or Solaris or Windows.

> My users, when faced with a result which means "permission error" will probably grant all permissions to all apps and all users because that’s the simplest way to make a permission error message go away.  My users don’t understand the Posix permission model, because they’re not computer experts, they are financial sector specialists, or psychologists, or tailors.  I don’t want them thinking about computer problems.  If they knew enough about computer problems to fix a permission problem the right way, they wouldn’t be paying me.

And, when faced with a result that says "disk I/O error", your users will probably think their disk is broken and take it in to be fixed.

So:

        for errors where the user *can* perhaps fix the problem, such as "out of file system space" (which already has its own error) and "out of disk quota" (which doesn't, and which is different from "out of file system space"), tell the user what the problem is (and, at the application level, offer a suggestion such as "delete some of those cat videos you've saved");

        for errors where the user probably *can't* fix the problem, tell them that there's a problem for which they need to talk to support, and tell them what to say to the support staff so that the support staff knows that, for example, a disk hasn't gone bad.

(And there are places where "you don't have permission to do that" *is* the appropriate thing to tell the user, e.g. if they're trying to open a document to which they haven't been given read permission, or trying to write to a document to which they haven't been given write permission, etc..  I suspect your support staff have better things to do with their time than explain to a user that they're not allowed to read somebody else's private files.)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Jens Alfke-2
In reply to this post by Guy Harris


> On Sep 26, 2017, at 1:57 PM, Guy Harris <[hidden email]> wrote:
>
> Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".

Um, that’s what I said.

> And none of this argues against presenting to the user, in their native language, a message saying "you've exceeded your file system quota", if that is, in fact, what happened.

This thread isn’t about filesystem quotas. Why do you keep bringing them up as an example?

> *don't* tell the user anything that might convince them that their disk is failing if you didn't get EIO or the equivalent on some other OS - and don't tell them something that, when relayed to tech support, would lead the support person to believe that, either.

As we’ve been saying, error messages produced by SQLite are not meant to be shown to end users, for all the reasons previously discussed.

SQLite’s error numbers ought to be sufficiently detailed once you enable extended error codes, and/or get the OS errno. The original set of error codes is inadequate to be sure, for historical reasons, but compatibility rules out breaking that API; that’s why the extended error codes exist.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
In reply to this post by Simon Slavin-3
On Sep 26, 2017, at 2:16 PM, Simon Slavin <[hidden email]> wrote:

> On 26 Sep 2017, at 9:57pm, Guy Harris <[hidden email]> wrote:
>
>> On Sep 26, 2017, at 1:37 PM, Jens Alfke <[hidden email]> wrote:
>>
>>>> On Sep 26, 2017, at 1:17 PM, Guy Harris <[hidden email]> wrote:
>>>>
>>>> A user wouldn't know what to do with "you've exceeded your stored data quota”?
>>>
>>> A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.)
>>
>> Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".
>
> No.  It means that you should present /your/ error messages to your users, not error messages generated by SQLite.  SQLite is a programmer’s tool.  Its users are programmers, and that’s who its error messages are addressed to.  You should not be letting your users see error message intended for you, and you should not be making your users worry about what to do about them.

"You" in "either localize your error messages, *or* make sure your API returns error codes that the application can turn into localized error messages", refers to SQLite.  It ultimately doesn't *need* have have error messages - it could leave that entirely up to the application - but it provides them nonetheless.

And there's an "or" in my statement; providing a way to get error codes more fine-grained than SQLITE_IOERR - so that you don't say "disk I/O error" for errors that have nothing to do with a disk reporting an I/O error - is something that the application would need in order to provide an appropriate error to end users and to the people to whom the end user might report an error.  And, no, "that error occurred on this operation" is not the sort of fine-grained to which I'm referring.

So just provide a way to get an indication of what *particular* type of error generated SQLITE_IOERR - permission error, quota error, actual disk I/O error, etc. - and recommend that this *always* be used for SQLITE_IOERR.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
In reply to this post by Scott Robison-2
On Sep 26, 2017, at 2:08 PM, Scott Robison <[hidden email]> wrote:

> There are physical errors and there are logical errors. If an error is
> generated from write, it's not unreasonable to classify it as an
> "output error". From read as an "input error".

"Output error", yes, although it'd be useful to provide more information.

"Disk I/O error", no; it'd be unreasonable to classify "out of file system free space", "over quota", "permission error", "file bigger than 2GB-1 bytes", etc. as "disk I/O errors".

> There is a lot of sqlite source code that already exists and has been
> written to work with the current interface. That's probably one of the
> reasons why extended errors were created, to provide finer
> granularity. Regardless of whether it is ideal or not, changing sqlite
> in a way that would break existing code is unlikely to happen.

I was not suggesting that.  I didn't suggest adding SQLITE_OVERQUOTA or SQLITE_WRITE_PERMISSION_ERROR.

> Ultimately it doesn't matter when error codes were added to a given
> operating system or which predates what. A decision was made in the
> past. The options are to live with decisions that were made in the
> past (one I've seen espoused multiple times in this mailing list),
> come up with an approach that allows old code to work but exposes new
> information (probably the genesis of extended error codes), or break
> older code (which I've not seen done deliberately).

I'm advocating a better version of the second of those choices than the current "here's the raw operating system error code" version that's currently provided.  (sqlite3_system_errno() also has the problem that if SQLITE_IOERR is provided for something *other* than a failure that provides a system errno value, it doesn't do the job.)

> That being said, I don't know any non-technical users who are going to
> panic that IOERR means their hard drive is dying specifically because
> of that text being displayed. Panic perhaps, but not that a hard drive
> is about to die. Most people I know don't have that level of
> understanding to correlate IO / ERR / hard drive failure rates.

They don't treat "disk I/O error" as an indication that their disk is having a problem?  That doesn't need an understanding of hard drive failure rates.

I have no reason to dismiss the original writer's notion that "disk I/O error" might "[scare] the hell out of the poor sysadmin who suspects a filesystem corruption might be going on".

> They
> just think the stupid program is broken and not letting them get their
> work done. As for the experienced technical people I know (or at least
> me), their first thought would be to investigate the problem, not to
> assume their hard drive is failing.

Less investigative work is needed if the software gives a more detailed error report.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
In reply to this post by Jens Alfke-2
On Sep 26, 2017, at 2:22 PM, Jens Alfke <[hidden email]> wrote:

>> On Sep 26, 2017, at 1:57 PM, Guy Harris <[hidden email]> wrote:
>>
>> Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".
>
> Um, that’s what I said.
>
>> And none of this argues against presenting to the user, in their native language, a message saying "you've exceeded your file system quota", if that is, in fact, what happened.
>
> This thread isn’t about filesystem quotas. Why do you keep bringing them up as an example?

Because the thread brings up the general question of folding multiple types of errors into a single error code, and because it's an example of an error you *would* want to show to the user, just as SQLITE_FULL is.

>> *don't* tell the user anything that might convince them that their disk is failing if you didn't get EIO or the equivalent on some other OS - and don't tell them something that, when relayed to tech support, would lead the support person to believe that, either.
>
> As we’ve been saying, error messages produced by SQLite are not meant to be shown to end users, for all the reasons previously discussed.
>
> SQLite’s error numbers ought to be sufficiently detailed once you enable extended error codes, and/or get the OS errno. The original set of error codes is inadequate to be sure, for historical reasons, but compatibility rules out breaking that API; that’s why the extended error codes exist.

Yes, which is why I wasn't suggesting changing the error codes.

I *would* suggests an additional API to get a *separate* extended error code, so that if, for example, a write() fails and that failure is turned into SQLITE_IOERR, you can get something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc..  I would also suggest that the documentation say that, if you don't have to run on a version of SQLite that doesn't support the new API, the new API be used by applications and libraries running atop SQLite in their error-reporting code, rather than, for example, just using sqlite3_errstr().
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Simon Slavin-3


On 26 Sep 2017, at 10:53pm, Guy Harris <[hidden email]> wrote:
>
> I *would* suggests an additional API to get a *separate* extended error code, so that if, for example, a write() fails and that failure is turned into SQLITE_IOERR, you can get something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc..

You know about this, right ?

<https://www.sqlite.org/c3ref/extended_result_codes.html>

<https://www.sqlite.org/c3ref/c_abort_rollback.html>

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Guy Harris
On Sep 26, 2017, at 3:11 PM, Simon Slavin <[hidden email]> wrote:

> On 26 Sep 2017, at 10:53pm, Guy Harris <[hidden email]> wrote:
>>
>> I *would* suggests an additional API to get a *separate* extended error code, so that if, for example, a write() fails and that failure is turned into SQLITE_IOERR, you can get something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc..
>
> You know about this, right ?
>
> <https://www.sqlite.org/c3ref/extended_result_codes.html>
>
> <https://www.sqlite.org/c3ref/c_abort_rollback.html>

Yes.  I do.

You know about this, right?

        https://www.sqlite.org/rescode.html#ioerr_access

It shows a whole bunch of codes, none of which are "something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc.".

I'm not asking for something that indicates what xXYZZY method reported the error.  I'm asking for something that indicates what the underlying problem causing the I/O error is, to the extent that information is available from the OS, i.e. *why* did the I/O operation not succeed?

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Jens Alfke-2


> On Sep 26, 2017, at 3:17 PM, Guy Harris <[hidden email]> wrote:
>
> It shows a whole bunch of codes, none of which are "something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc.".
>
> I'm not asking for something that indicates what xXYZZY method reported the error.  I'm asking for something that indicates what the underlying problem causing the I/O error is, to the extent that information is available from the OS, i.e. *why* did the I/O operation not succeed?

Yes, you’re right — I hadn’t looked at the definitions of those extended codes, and they seem … um, not super useful. As a client of SQLite, I want to know what specifically went wrong, not which internal bit of SQLite reported the error.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Nico Williams
In reply to this post by Jens Alfke-2
On Tue, Sep 26, 2017 at 01:37:42PM -0700, Jens Alfke wrote:
> > On Sep 26, 2017, at 1:17 PM, Guy Harris <[hidden email]> wrote:
> > A user wouldn't know what to do with "you've exceeded your stored data quota”?
>
> A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages
> are not localized.) And there are plenty of messages that are much
> less understandable to a lay user than the one you picked out.

They could be.  And regardless, more detail in the error _code_ is
better for the applicaton developer.

EIO is definitely an I/O error.  Could be all sorts of things.  E.g.,
you're using iSCSI and the network is timing out.

ENOSPC is very, very different.  Reporting ENOSPC as an I/O error means
that the app or the user must now use df(1) or strace(1) or similar to
work it out, when SQLite3 could just have reported that the FS is full.
Ditto EDQUOT.

EROFS is also very different.

And so on.

These are ancient error codes.

> > The *number* might annoy the support staff; right off the top of
> > your head, what's the error number for "file system quota exceeded"
> > or "I/O error"?  (No cheating by looking it up in a man page or
> > include file!)
>
> On the contrary, error numbers are a lot easier for support. They’re
> independent of locale, they don’t get re-worded from one version of
> the app to the next, and they’re very short and easy to dictate over
> the phone. Of course, these shouldn’t be the primary error information
> given to the user! But the user-level error message should be
> something specific to the application, like “an unexpected database
> error occurred (19)” instead of "Abort due to constraint violation”.
> The number would appear only for support purposes.

As long as you can resolve them to symbolic names and/or messages.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: bug: failure to write journal reported as "disk I/O error"

Keith Medcalf
In reply to this post by Jens Alfke-2

Well, the terminology is correct.  These *ARE* I/O Errors.  The system attempted I/O.  It failed.  Hence the term I/O Error.  It is irrelevant whether the error was caused because the heads on the tape drive need cleaning, access was denied to spool storage, the disk was full, someone yanked the cable out of the disk drive, or the card reader got jammed up.
 
The program attempted to perform an I/O operation (of some kind).
That operation failed.

Now it is up to you, the application programmer, to figure out what to do.  There are quite a few facilities available to help you do this.  SQLite itself has Extended error codes that can help point to where the trouble is.  You can ask the Operating System for its abend code.  You can sacrifice chickens or baby's or perhaps read the tea leaves.

Personally I think we need a reversion to the old days when there were only four status codes:  OK, What?, How?, and Where?

This is far more effective than niggling over what an error code means.  It means there was an error.  Full-stop end of sentence, paragraph, page, chapter, section, story and book.  There are more than adequate was of determining the nature and localization of the error.  Use them.  Love them.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Jens Alfke
>Sent: Tuesday, 26 September, 2017 21:49
>To: SQLite mailing list
>Subject: Re: [sqlite] bug: failure to write journal reported as "disk
>I/O error"
>
>
>
>> On Sep 26, 2017, at 3:17 PM, Guy Harris <[hidden email]> wrote:
>>
>> It shows a whole bunch of codes, none of which are "something that
>distinguishes EIO from other errors such as EFBIG, EDQUOT, etc.".
>>
>> I'm not asking for something that indicates what xXYZZY method
>reported the error.  I'm asking for something that indicates what the
>underlying problem causing the I/O error is, to the extent that
>information is available from the OS, i.e. *why* did the I/O
>operation not succeed?
>
>Yes, you’re right — I hadn’t looked at the definitions of those
>extended codes, and they seem … um, not super useful. As a client of
>SQLite, I want to know what specifically went wrong, not which
>internal bit of SQLite reported the error.
>
>—Jens
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12