Seg fault with core dump. How to explore?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Seg fault with core dump. How to explore?

Kevin O'Gorman
I'm testing new code, and my latest trial run ended with a segmentation
fault after about 5 hours.
I'm running Python 3.5 and its standard sqlite3 module On Xubuntu 16.04.3
LTS.  The code is short -- about 300 lines.

This particular program is merging two databases.  The result has reached
25 GB, roughly 1/3 of what I expect of the final result (over 100M rows).
The filesystem is a RAID with 2+ TB free.  The machine is a Core i7 with 32
GB RAM and 0 swap has been used since the last reboot.  Nothing else much
is running on this machine except some idle terminal and browser windows.

Here's my prime suspect: I'm using WAL, and the journal is 543 MB.  I
hadn't given it much thought, but could this be more than the software
really wants to deal with?  I'm going to try doing occasional commits
(every 100K inserts/updates perhaps,) but I'd like some help:
1. If I'm on the right track, tell me so I can stop worrying and proceed
with development.
2. If I'm on the wrong track, help me figure out how to debug the problem.
I can probably find out what particular part of the merge it had reached,
but it's going to take quite a while.  I'm pretty good with GDB  but I have
no idea how to explore a running Python program.

The project is a hobby, so there's nothing proprietary, and I can post any
information that would help.

++ kevin

--
word of the year: *kakistocracy*
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Simon Slavin-3


On 30 Sep 2017, at 10:54pm, Kevin O'Gorman <[hidden email]> wrote:

> Here's my prime suspect: I'm using WAL, and the journal is 543 MB.  I
> hadn't given it much thought, but could this be more than the software
> really wants to deal with?

No SQLite.  Possibly something else you’re using.  I used to work daily with a 43 Gigabyte SQLite database.  And most of that space was used by one tall thin table.  SQLite has known limits and is not thoroughly tested near those limits (because nobody can afford to buy enough hardware to do it) , but those limits are a lot more than half a Gigabyte.

<https://sqlite.org/limits.html>

A crash sometimes happens because the programmer continues to call sqlite_ routines after one of them has already reported a problem.  Are you checking the values returned by all sqlite_() calls to see that it is SQLITE_OK ?  You may have to learn how your Python shim works to know: it may interpret other results as "catch" triggers or some equivalent.

Are you using the standard Python shim or APSW ?  The standard Python shim does complicated magic to make SQLite behave the way Python wants it to behave.  This complication can make it difficult to track down faults.  You might instead want to try APSW:

<https://rogerbinns.github.io/apsw/>

This is an extremely thin shim that does almost nothing itself.  That makes it easy to track down all errors to a Python problem or a SQLite problem.  I’m not saying we can’t help with the standard Python import, just that it’s a little more complicated.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Joseph L. Casale
In reply to this post by Kevin O'Gorman
-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On
Behalf Of Kevin O'Gorman
Sent: Saturday, September 30, 2017 3:55 PM
To: sqlite-users <[hidden email]>
Subject: [sqlite] Seg fault with core dump. How to explore?

> Here's my prime suspect: I'm using WAL, and the journal is 543 MB.

Do you really need any reliability at all for a test? Who cares if the
power goes out or the program crashes? If this is a test, you will simply
restart it and the data is irrelevant so why impede any potential
performance for data integrity?

Try setting the journal_mode off...
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Kevin O'Gorman
In reply to this post by Simon Slavin-3
I'm using the standard shim, because I've been using it forever and first
head of APSW just a few days ago.  I'm guessing it should be pretty easy to
switch because I'm not doing anything weird.  All my columns are INTEGER or
CHAR, there are not even any foreign keys, although one of the two main
tables does contain primary keys (integer autoincrement primary key) of the
other.

I'm a little leery of switching on account of one crash, as it may weel be
an over-reaction.

On Sat, Sep 30, 2017 at 4:30 PM, Simon Slavin <[hidden email]> wrote:

>
>
> On 30 Sep 2017, at 10:54pm, Kevin O'Gorman <[hidden email]>
> wrote:
>
> > Here's my prime suspect: I'm using WAL, and the journal is 543 MB.  I
> > hadn't given it much thought, but could this be more than the software
> > really wants to deal with?
>
> No SQLite.  Possibly something else you’re using.  I used to work daily
> with a 43 Gigabyte SQLite database.  And most of that space was used by one
> tall thin table.  SQLite has known limits and is not thoroughly tested near
> those limits (because nobody can afford to buy enough hardware to do it) ,
> but those limits are a lot more than half a Gigabyte.
>
> <https://sqlite.org/limits.html>
>
> A crash sometimes happens because the programmer continues to call sqlite_
> routines after one of them has already reported a problem.  Are you
> checking the values returned by all sqlite_() calls to see that it is
> SQLITE_OK ?  You may have to learn how your Python shim works to know: it
> may interpret other results as "catch" triggers or some equivalent.
>
> Are you using the standard Python shim or APSW ?  The standard Python shim
> does complicated magic to make SQLite behave the way Python wants it to
> behave.  This complication can make it difficult to track down faults.  You
> might instead want to try APSW:
>
> <https://rogerbinns.github.io/apsw/>
>
> This is an extremely thin shim that does almost nothing itself.  That
> makes it easy to track down all errors to a Python problem or a SQLite
> problem.  I’m not saying we can’t help with the standard Python import,
> just that it’s a little more complicated.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
word of the year: *kakistocracy*
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Kevin O'Gorman
In reply to this post by Joseph L. Casale
What I'm testing is my code.  I want to be sure the code is going to work.
A crash is a primary indication that it won't.  That's information, not
just an annoyance.

On Sat, Sep 30, 2017 at 5:37 PM, Joseph L. Casale <[hidden email]
> wrote:

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On
> Behalf Of Kevin O'Gorman
> Sent: Saturday, September 30, 2017 3:55 PM
> To: sqlite-users <[hidden email]>
> Subject: [sqlite] Seg fault with core dump. How to explore?
>
> > Here's my prime suspect: I'm using WAL, and the journal is 543 MB.
>
> Do you really need any reliability at all for a test? Who cares if the
> power goes out or the program crashes? If this is a test, you will simply
> restart it and the data is irrelevant so why impede any potential
> performance for data integrity?
>
> Try setting the journal_mode off...
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
word of the year: *kakistocracy*
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Joseph L. Casale
-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On
Behalf Of Kevin O'Gorman
Sent: Saturday, September 30, 2017 6:40 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Seg fault with core dump. How to explore?

> What I'm testing is my code.  I want to be sure the code is going to work.
> A crash is a primary indication that it won't.  That's information, not
> just an annoyance.

And having the database around provides insight into what went wrong?
Have you used it previously to solve a bug? Possibly but I assume not...

Unless you commit each and every single operation, you likely won't get
much insight into the specific state before it died, and that won't be
performant enough with your data set. In my opinion, you get far more
insight with instrumentation in your code and that likely makes the database
irrelevant.

However, that is just a theory.

BTW, for future work you might want to look at apsw. Whenever I have a
Python project, I always use it as I find the api far superior amongst other things.
Plus the maintainer is very responsive.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Clemens Ladisch
In reply to this post by Kevin O'Gorman
Kevin O'Gorman wrote:
> my latest trial run ended with a segmentation fault

Really a segmentation fault?  What is the error message?

> This particular program is merging two databases.  The result has reached
> 25 GB, roughly 1/3 of what I expect of the final result (over 100M rows).
> The filesystem is a RAID with 2+ TB free.

Does the /var/tmp filesystem fill up?

> Here's my prime suspect: I'm using WAL, and the journal is 543 MB.

In WAL mode, the log stores the new versions of all changed pages.
In rollback journal mode, the journal stores the old version of all
changed pages.  So when you're creating a new DB (where the old version
is empty), journal rollback mode is likely to be more efficient.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Kevin O'Gorman
On Sat, Sep 30, 2017 at 11:41 PM, Clemens Ladisch <[hidden email]>
wrote:

> Kevin O'Gorman wrote:
> > my latest trial run ended with a segmentation fault
>
> Really a segmentation fault?  What is the error message?
>

What such things always say "segementation fault (core dumped)" and the
name of the program.

>
> > This particular program is merging two databases.  The result has reached
> > 25 GB, roughly 1/3 of what I expect of the final result (over 100M rows).
> > The filesystem is a RAID with 2+ TB free.
>
> Does the /var/tmp filesystem fill up?
>

No.  And /var/tmp is not used as I've redirected tmp onto my RAID

>
> > Here's my prime suspect: I'm using WAL, and the journal is 543 MB.
>
> In WAL mode, the log stores the new versions of all changed pages.
> In rollback journal mode, the journal stores the old version of all
> changed pages.  So when you're creating a new DB (where the old version
> is empty), journal rollback mode is likely to be more efficient.
>
> I'm not creating a new database.  I'm merging one into the other.


>
> Regards,
> Clemens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

--
word of the year: *kakistocracy*
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seg fault with core dump. How to explore?

Simon Slavin-3
On 2 Oct 2017, at 5:33am, Kevin O'Gorman <[hidden email]> wrote:

> What such things always say "segementation fault (core dumped)" and the
> name of the program.

Try standard investation for any Python program which gives a segmentation fault. Waht does faulthandler say ?

<https://pypi.python.org/pypi/faulthandler/3.0>

If that doesn’t help, use GDB:

prompt$ gdb python
… blah …
(gdb) r myprog.py
… blah …
… crash notice …
(gdb) bt

If your program crashes in shell but not in the debugger you have a memory-management problem unrelated to SQLite.

Additional questions if that doesn’t solve it for you:

What modules/packages are you importing ?  Are any of them not needed to get your code to the point where it triggers the crash ?  If so, try not loading them.

Can you demonstrate the problem with a tiny dataset rather than the long one which caused the problem ?  Does the dataset matter at all or is it just the number of operations that matters ?  Try generating random data and see if the crash is always on the same row 143473 or something.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users