Locking semantics are broken?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Locking semantics are broken?

dandl
> Do not use SQLite for concurrent access over a network connection. Locking
> semantics are broken for most network filesystems, so you will have
> corruption issues that are no fault of SQLite.

I have seen this comment made more than once on this list. Is there any reliable evidence to support this for a Windows-based network?

Disclosure: we wrote and maintain an ISAM-based multi-user database product which relies on network locking. We have conducted exhaustive tests over many years and in our opinion, locking and multi-user semantics on Windows XP and later networks are reliable and free of errors, if performed correctly by the client software.

[We use the same semantics for a Linux or Unix-based system with multiple terminals, but not on any Unix-based network. This is only about Windows.]

If Sqlite has a problem then perhaps it can be fixed.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

Clemens Ladisch
dandl wrote:
>> Do not use SQLite for concurrent access over a network connection. Locking
>> semantics are broken for most network filesystems, so you will have
>> corruption issues that are no fault of SQLite.
>
> I have seen this comment made more than once on this list. Is there any
> reliable evidence to support this for a Windows-based network?

There have been locking bugs in quite a few Windows versions, but these
get fixed.

Opportunistic locks can produce errors when the network goes down
temporarily:
1. the client has an exclusive lock;
2. the server cannot tell the client to release it, and after some time
   assumes the client has crashed;
3. the server moves ownership to another client;
4. both clients assume they have the exclusive lock, and do writes.

But other than that, locking should work just fine.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

Rowan Worth
In reply to this post by dandl
On 28 June 2016 at 16:07, dandl <[hidden email]> wrote:

> > Do not use SQLite for concurrent access over a network connection.
> Locking
> > semantics are broken for most network filesystems, so you will have
> > corruption issues that are no fault of SQLite.
>
> I have seen this comment made more than once on this list. Is there any
> reliable evidence to support this for a Windows-based network?
>
> Disclosure: we wrote and maintain an ISAM-based multi-user database
> product which relies on network locking. We have conducted exhaustive tests
> over many years and in our opinion, locking and multi-user semantics on
> Windows XP and later networks are reliable and free of errors, if performed
> correctly by the client software.
>

I can't comment on windows sorry, but in-house we use sqlite databases
shared between many unix clients via network file systems. Over the years
we've used nfs3, nfs4, and lustre.

These databases are subject to highly concurrent usage every working day.
In the past two years we've had maybe one corruption issue which implicated
the file system (client side logging suggested that four RESERVED locks
were obtained concurrently).


More commonly corruption has been the result of user/application
misbehaviour:

1. Users copying databases while they're being updated (leaving them with a
corrupt copy)
2. Users inadvertently symlinking/hardlinking database files
3. Our application inadvertently discarding sqlite's locks after backing up
the database (thanks POSIX locking semantics)


(3) was the main offender for us. Since figuring that out we've been left
with a very robust environment - but not bullet proof as indicated above. I
can imagine this kind of thing being sensitive to network/file system
configuration, which is not easy to diagnose as a sysadmin let alone via
email so in that sense I understand why network file systems are
discouraged on the list (aside from the fact that sqlite and its
database-level single user lock was not designed for networked concurrent
usage).

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

Simon Slavin-3
In reply to this post by dandl

On 28 Jun 2016, at 9:07am, dandl <[hidden email]> wrote:

>> Do not use SQLite for concurrent access over a network connection. Locking
>> semantics are broken for most network filesystems, so you will have
>> corruption issues that are no fault of SQLite.
>
> I have seen this comment made more than once on this list. Is there any reliable evidence to support this for a Windows-based network?

Actually, the problem that causes causes people to make that warning occurs in POSIX.  See the beginning of section 6.0 in this page:

<https://www.sqlite.org/lockingv3.html>

However, while the above is a definitely known, verifiable problem, with every implementation of POSIX, we have had occasional reports about locking problems with Windows as documented in section 9.1 here:

<https://www.sqlite.org/atomiccommit.html>

The problem is that with the numerous versions of Windows, File System and Network system, nobody has come up with a fault which can be reproduced by the developers.  But we do get enough vague reports of problems with Windows to make us believe that there is something wrong somewhere.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

Donald Shepherd
In reply to this post by Rowan Worth
On Tue, 28 Jun 2016 at 19:17 Rowan Worth <[hidden email]> wrote:

> On 28 June 2016 at 16:07, dandl <[hidden email]> wrote:
>
> > > Do not use SQLite for concurrent access over a network connection.
> > Locking
> > > semantics are broken for most network filesystems, so you will have
> > > corruption issues that are no fault of SQLite.
> >
> > I have seen this comment made more than once on this list. Is there any
> > reliable evidence to support this for a Windows-based network?
> >
> > Disclosure: we wrote and maintain an ISAM-based multi-user database
> > product which relies on network locking. We have conducted exhaustive
> tests
> > over many years and in our opinion, locking and multi-user semantics on
> > Windows XP and later networks are reliable and free of errors, if
> performed
> > correctly by the client software.
> >
>
> I can't comment on windows sorry, but in-house we use sqlite databases
> shared between many unix clients via network file systems. Over the years
> we've used nfs3, nfs4, and lustre.
>
> These databases are subject to highly concurrent usage every working day.
> In the past two years we've had maybe one corruption issue which implicated
> the file system (client side logging suggested that four RESERVED locks
> were obtained concurrently).
>
>
> More commonly corruption has been the result of user/application
> misbehaviour:
>
> 1. Users copying databases while they're being updated (leaving them with a
> corrupt copy)
> 2. Users inadvertently symlinking/hardlinking database files
> 3. Our application inadvertently discarding sqlite's locks after backing up
> the database (thanks POSIX locking semantics)
>
>
> (3) was the main offender for us. Since figuring that out we've been left
> with a very robust environment - but not bullet proof as indicated above. I
> can imagine this kind of thing being sensitive to network/file system
> configuration, which is not easy to diagnose as a sysadmin let alone via
> email so in that sense I understand why network file systems are
> discouraged on the list (aside from the fact that sqlite and its
> database-level single user lock was not designed for networked concurrent
> usage).
>

We are Windows-based and the only corruptions we've seen is naive copying
of an in-use database (#1 in your list) plus bugs such as crashes when
transferring the database between servers resulting in an incomplete copy.
Neither have been the responsibility of SQLite and we've generally
mitigated both by tool and usage improvement as best we can.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

dandl
In reply to this post by Simon Slavin-3
Fair comment.

We have seen problems:
1. In all versions of Windows based on the 95 kernel, and especially Windows
ME (but not the NT kernel since 3.5)
2. At any time if the network infrastructure is unreliable (too many errors
or retries)
3. At any time if a client machine misbehaves eg crashes while holding a
lock, or attempts its own file accesses etc
4. Recently, apparently related to SMB 3.0+, due to more aggressive
performance optimisations.

But if everything is configured right and working right and nothing bad
happens then it is highly reliable over very large volumes of transactions.

I guess my plea would be to emphasise the need to pay attention to all the
details and to warn that there is still a risk of uncontrolled data loss,
rather than just branding it as 'broken'.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org



> -----Original Message-----
> From: [hidden email] [mailto:sqlite-users-
> [hidden email]] On Behalf Of Simon Slavin
> Sent: Tuesday, 28 June 2016 7:28 PM
> To: SQLite mailing list <[hidden email]>
> Subject: Re: [sqlite] Locking semantics are broken?
>
>
> On 28 Jun 2016, at 9:07am, dandl <[hidden email]> wrote:
>
> >> Do not use SQLite for concurrent access over a network connection.
> >> Locking semantics are broken for most network filesystems, so you
> >> will have corruption issues that are no fault of SQLite.
> >
> > I have seen this comment made more than once on this list. Is there any
> reliable evidence to support this for a Windows-based network?
>
> Actually, the problem that causes causes people to make that warning
occurs
> in POSIX.  See the beginning of section 6.0 in this page:
>
> <https://www.sqlite.org/lockingv3.html>
>
> However, while the above is a definitely known, verifiable problem, with
> every implementation of POSIX, we have had occasional reports about
locking
> problems with Windows as documented in section 9.1 here:
>
> <https://www.sqlite.org/atomiccommit.html>
>
> The problem is that with the numerous versions of Windows, File System and
> Network system, nobody has come up with a fault which can be reproduced by
> the developers.  But we do get enough vague reports of problems with
Windows
> to make us believe that there is something wrong somewhere.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

John Found
On Wed, 29 Jun 2016 01:03:28 +1000
"dandl" <[hidden email]> wrote:

> But if everything is configured right and working right and nothing bad
> happens then it is highly reliable over very large volumes of transactions.

For me, this is a clear definition of the term "not reliable". Isn't it?

--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

R Smith


On 2016/06/28 5:46 PM, John Found wrote:
> On Wed, 29 Jun 2016 01:03:28 +1000
> "dandl" <[hidden email]> wrote:
>
>> But if everything is configured right and working right and nothing bad
>> happens then it is highly reliable over very large volumes of transactions.
> For me, this is a clear definition of the term "not reliable". Isn't it?
>

Ha, true but the point is only semantic though...

The statement: "My car is reliable so long as I remember to add Fuel and
not remove the wheels.", is not really self-contradicting, is it?

"My car is reliable so long as it doesn't break" - is a different matter.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Locking semantics are broken?

Stephen Chrzanowski
In Mr Smiths examples, one statement effect is a direct result of an action
you (subjectively speaking) knowingly did, or did not do, to the car.  You
did not fuel the car.  You did take the wheels off.  There by, your car is
essentially a hunk of metal taking up space.  It doesn't function as
designed.

The second, if you're driving along, blow the radiator hose, break a belt,
spark plug fails, coils and cables start shorting, push a rod through the
engine housing, whatever.  You've got a failure in the system that prevents
the car from function that happens due to indirect activities with the
vehicle.

Both cases essentially make the car not run, or becomes non-functional, or
causes damage.  Say a spark hitting the pool of drying oil on the engine
which sparks a fire?  Or the belt you break happens to be the timing belt?
Piston rod that goes through the case?  Sparks grounding to the engine
instead of in the cylinder head?

For SQLite, running your software where your data store is on the network,
your running the philosophy of the second statement.  Some
people/businesses have run their SQLite database system for YEARS without
an issue.  Myself, I've never been stranded in my 20 years of driving with
any of my vehicles, and trust me, I've driven wrecks of cars, so I've been
lucky.  My mother, however, one time managed to drive about 2 hours down
the highway, then come to a stop sign on an off-ramp, and the transmission
just would not engage in gear when she went to go.

I've NEVER heard of an instance of MySQL, Postgres, MSSQL, or any other
major database that reliably runs on a different machine.  Different
partitions, different file systems, or different hard drives all controlled
by a single OS, yes, but all run ON a single OS, not across a network.

My thoughts on this are all illustrated here :
http://randomthoughts.ca/index.php?/archives/7-Serverless-Servers-using-NFS-Why-it-cant-happen.html




On Tue, Jun 28, 2016 at 12:05 PM, R Smith <[hidden email]> wrote:


> Ha, true but the point is only semantic though...
>
> The statement: "My car is reliable so long as I remember to add Fuel and
> not remove the wheels.", is not really self-contradicting, is it?
>
> "My car is reliable so long as it doesn't break" - is a different matter.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users