Test failures on GPFS

classic Classic list List threaded Threaded
6 messages Options
T J
Reply | Threaded
Open this post in threaded view
|

Test failures on GPFS

T J
Hi,

I was interested in using sqlite over GPFS.  I've seen a few useful threads
on this:

   - Network file system that support sqlite3 well

   https://www.mail-archive.com/sqlite-users@.../msg117085.html

   - disable file locking mechanism over the network

   https://www.mail-archive.com/sqlite-users@.../msg116846.html

From these, I can see that there are some performance issues, even if I
willing (which I am not) to make all access (read+write) sequential. [I
don't expect to need many, if any, concurrent writers, but I will typically
have concurrent readers.]

To get a better sense of things, I downloaded 3.31.0 and ran the test suite
on GPFS.  Overall, it looks pretty good, but there were some WAL failures.
Could someone comment on the precise implication of those test failures?
I'm interested to know what usage patterns are likely to cause problems,
and which are likely safe.  Also, which other tests can I run (
https://www.sqlite.org/testing.html)? Perhaps more tests around concurrent
read/writes?

!Failures on these tests: e_walauto-1.1.2 e_walauto-1.1.3
e_walauto-1.1.5 e_walauto-1.1.7 e_walauto-1.1.12.3 e_walauto-1.1.12.5
e_walauto-1.2.2 e_walauto-1.2.3 e_walauto-1.2.5 e_walauto-1.2.7
e_walauto-1.2.12.3 e_walauto-1.2.12.5 zipfile-2.4a.2.1
zipfile-2.4a.2.2


Thanks in advance.  The `make test` output log snippet is below.
---

e_walauto-1.1.0... Ok

e_walauto-1.1.1... Ok

e_walauto-1.1.2...

! e_walauto-1.1.2 expected: [1]

! e_walauto-1.1.2 got:      [0]

e_walauto-1.1.3...

! e_walauto-1.1.3 expected: [1]

! e_walauto-1.1.3 got:      [0]

e_walauto-1.1.4... Ok

e_walauto-1.1.5...

! e_walauto-1.1.5 expected: [1]

! e_walauto-1.1.5 got:      [0]

e_walauto-1.1.6... Ok

e_walauto-1.1.7...

! e_walauto-1.1.7 expected: [1]

! e_walauto-1.1.7 got:      [0]

e_walauto-1.1.7... Ok

e_walauto-1.1.8... Ok

e_walauto-1.1.9... Ok

e_walauto-1.1.10.1... Ok

e_walauto-1.1.10.2... Ok

e_walauto-1.1.11.1... Ok

e_walauto-1.1.11.2... Ok

e_walauto-1.1.11.3... Ok

e_walauto-1.1.12.1... Ok

e_walauto-1.1.12.2... Ok

e_walauto-1.1.12.3...

! e_walauto-1.1.12.3 expected: [2]

! e_walauto-1.1.12.3 got:      [0]

e_walauto-1.1.12.4... Ok

e_walauto-1.1.12.5...

! e_walauto-1.1.12.5 expected: [1559]

! e_walauto-1.1.12.5 got:      [0]

e_walauto-1.2.0... Ok

e_walauto-1.2.1... Ok

e_walauto-1.2.2...

! e_walauto-1.2.2 expected: [1]

! e_walauto-1.2.2 got:      [0]

e_walauto-1.2.3...

! e_walauto-1.2.3 expected: [1]

! e_walauto-1.2.3 got:      [0]

e_walauto-1.2.4... Ok

e_walauto-1.2.5...

! e_walauto-1.2.5 expected: [1]

! e_walauto-1.2.5 got:      [0]

e_walauto-1.2.6... Ok

e_walauto-1.2.7...

! e_walauto-1.2.7 expected: [1]

! e_walauto-1.2.7 got:      [0]

e_walauto-1.2.7... Ok

e_walauto-1.2.8... Ok

e_walauto-1.2.9... Ok

e_walauto-1.2.10.1... Ok

e_walauto-1.2.10.2... Ok

e_walauto-1.2.11.1... Ok

e_walauto-1.2.11.2... Ok

e_walauto-1.2.11.3... Ok

e_walauto-1.2.12.1... Ok

e_walauto-1.2.12.2... Ok

e_walauto-1.2.12.3...

! e_walauto-1.2.12.3 expected: [2]

! e_walauto-1.2.12.3 got:      [0]

e_walauto-1.2.12.4... Ok

e_walauto-1.2.12.5...

! e_walauto-1.2.12.5 expected: [1559]

! e_walauto-1.2.12.5 got:      [0]

e_walauto.test-closeallfiles... Ok

e_walauto.test-sharedcachesetting... Ok

Time: e_walauto.test 92703 ms

...

zipfile2.test-closeallfiles... Ok

zipfile2.test-sharedcachesetting... Ok

Time: zipfile2.test 14 ms

Memory used:          now         24  max    9283664  max-size   16908288

Allocation count:     now          1  max    1311131

Page-cache used:      now          0  max         13  max-size      65800

Page-cache overflow:  now          0  max   20640016

SQLite 2020-01-10 01:05:49
0a500da6aa659a8e73206e6d22ddbf2da5e4f1d1d551eeb66433163a3e13109d

14 errors out of 249964 tests on localhost Linux 64-bit little-endian

!Failures on these tests: e_walauto-1.1.2 e_walauto-1.1.3
e_walauto-1.1.5 e_walauto-1.1.7 e_walauto-1.1.12.3 e_walauto-1.1.12.5
e_walauto-1.2.2 e_walauto-1.2.3 e_walauto-1.2.5 e_walauto-1.2.7
e_walauto-1.2.12.3 e_walauto-1.2.12.5 zipfile-2.4a.2.1
zipfile-2.4a.2.2

All memory allocations freed - no leaks

Memory used:          now          0  max    9283664  max-size   16908288

Allocation count:     now          0  max    1311131

Page-cache used:      now          0  max         13  max-size      65800

Page-cache overflow:  now          0  max   20640016

Maximum memory usage: 9283664 bytes

Current memory usage: 0 bytes

Number of malloc()  : -1 calls
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Test failures on GPFS

Jens Alfke-2

> On Jan 11, 2020, at 2:58 PM, T J <[hidden email]> wrote:
>
> I was interested in using sqlite over GPFS.

The standard advice on using SQLite over a network file system is “don’t do it.” Even if you find the rare file system that handles locks properly, you’ll likely have performance issues.

A client/server database like Postgres or MySQL is a better fit for a distributed use case. If you’re sending everything over the network, it makes more sense to send just the queries & results, not the innards of the b-tree too. Is there a reason you can’t use one of those?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Test failures on GPFS

J. King-3
In reply to this post by T J
On January 11, 2020 5:57:31 p.m. EST, T J <[hidden email]> wrote:

>I was interested in using sqlite over GPFS.  I've seen a few useful
>threads
>on this:
>
> [...]
>
>Overall, it looks pretty good, but there were some WAL
>failures.
>Could someone comment on the precise implication of those test
>failures?

WAL mode does not work over the network, so the test failures are presumably to be expected.

--
J. King
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Test failures on GPFS

Richard Hipp-3
On 1/11/20, J. King <[hidden email]> wrote:
>
> WAL mode does not work over the network, so the test failures are presumably
> to be expected.
>

WAL mode should work on a network filesystem, as long as all of the
clients are on the same host computer, and as long as mmap()-ing the
*-shm file gives all the clients shared memory.  Dunno if GPFS does
that or not, though.  Maybe not.  Or, maybe not reliably.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Test failures on GPFS

Roman Fleysher-2

I use SQLite over GPFS , but in DELETE (which I think is the default) mode. Not WAL mode. No issues with locking, except performance when accessing concurrently from multiple nodes. As others pointed out, this has to do with the overhead due to lock requests. GPFS must coordinate with many nodes. My observation is that when concurrent access is from a few nodes, the performance is OK even though number of nodes is always the same. Thus, GPFS coordinates in some smart way only between nodes actively involved.

One reason I do not use mySQL with its more efficient network access is that sys admin must set it up. With SQLite, I am independent. In addition, in my SQL there are authentication issues to be dealt with. I rely on GPFS file access permissions (access control list, ACL) to regulate access to database.

I heard about BeadrockDB, which internally uses SQLite and provides network access with replication. I have not tried it and do not know what is involved.


Roman


________________________________
From: sqlite-users <[hidden email]> on behalf of Richard Hipp <[hidden email]>
Sent: Saturday, January 11, 2020 8:59 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Test failures on GPFS

CAUTION: This email comes from an external source; the attachments and/or links may compromise our secure environment. Do not open or click on suspicious emails. Please click on the “Phish Alert” button on the top right of the Outlook dashboard to report any suspicious emails.

On 1/11/20, J. King <[hidden email]> wrote:
>
> WAL mode does not work over the network, so the test failures are presumably
> to be expected.
>

WAL mode should work on a network filesystem, as long as all of the
clients are on the same host computer, and as long as mmap()-ing the
*-shm file gives all the clients shared memory.  Dunno if GPFS does
that or not, though.  Maybe not.  Or, maybe not reliably.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&amp;data=02%7C01%7Croman.fleysher%40einsteinmed.org%7Cba1544a0f3584e8a077008d7970309d8%7C9c01f0fd65e040c089a82dfd51e62025%7C0%7C1%7C637143911624961155&amp;sdata=udLAzknx7zL4yHzQk8ZPQI8mAWltFusqvcb%2FW31XuaY%3D&amp;reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
T J
Reply | Threaded
Open this post in threaded view
|

Test failures on GPFS

T J
On Sunday, January 12, 2020, Roman Fleysher <[hidden email]>
wrote:

>
> I use SQLite over GPFS , but in DELETE (which I think is the default)
> mode. Not WAL mode. No issues with locking, except performance when
> accessing concurrently from multiple nodes. As others pointed out, this has
> to do with the overhead due to lock requests. GPFS must coordinate with
> many nodes. My observation is that when concurrent access is from a few
> nodes, the performance is OK even though number of nodes is always the
> same. Thus, GPFS coordinates in some smart way only between nodes actively
> involved.
>
> One reason I do not use mySQL with its more efficient network access is
> that sys admin must set it up. With SQLite, I am independent. In addition,
> in my SQL there are authentication issues to be dealt with. I rely on GPFS
> file access permissions (access control list, ACL) to regulate access to
> database.
>
> I heard about BeadrockDB, which internally uses SQLite and provides
> network access with replication. I have not tried it and do not know what
> is involved.
>
>
>
MySQL and similar would indeed be nice to use, but in addition to the
administrative cost, there are also developer costs to get things set up so
that every developer can do work in their own db without affecting the
production db, as well as complexity costs with getting data into those
dbs. Contrast this with just copying the sqlite file(s) as needed (though
integrity concerns still exist).

So I'm mostly weighing options. The data is very much many-reads,
few-writes. Also considering just using an external locking service and
simple flat files, but this has obvious downsides of fewer (if any) data
types, no joins, no transactions, etc.

I may give this a try and see if the perf hit is tolerable.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users