PRAGMA optimize; == no busy handler?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

PRAGMA optimize; == no busy handler?

Howard Kapustein
Using SQLite 3.20.1 I notice a flood of log events sometimes when I call PRAGMA optimize;

Warning 0x5: statement aborts at 1: [PRAGMA optimize;] database is locked
And a few times
Warning 0x5: statement aborts at 2: [PRAGMA optimize;] database is locked
And even once
Warning 0x5: statement aborts at 35: [PRAGMA optimize;] database is locked

We use the default busy-handler w/timeout=250ms and call sqlite_exec("PRAGMA optimize;") in a loop until success or non-busy/locked error or some ungodly amount of time elapses (~5min). The logs indicate PRAGMA optimize; fails due to SQLITE_BUSY w/o ever going through the busy-handler, and then we spin trying a few thousand times.

Docs for PRAGMA optimize; says nothing about busy|locked scenarios, nor any need to call this in a transaction (nor even if that's legal). Ditto ANALYZE docs say nothing about busy|locked scenarios

What's expected if PRAGMA optimize hits a busy|locked scenario?
What am I the caller expected to do?


  *   Howard
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Simon Slavin-3
On 4 Oct 2017, at 12:54am, Howard Kapustein <[hidden email]> wrote:

> We use the default busy-handler w/timeout=250ms and call sqlite_exec("PRAGMA optimize;") in a loop until success or non-busy/locked error or some ungodly amount of time elapses (~5min). The logs indicate PRAGMA optimize; fails due to SQLITE_BUSY w/o ever going through the busy-handler, and then we spin trying a few thousand times.

Don’t do that.  It defeats the purpose of the timeout you set.  If you want a longer timeout than 250ms, set that as your timeout.  SQLite’s own retry process is clever.  It uses exponential backoff and can get at the locks at a lower level than your own code.

I’ve used timeouts in my own code of 5 minutes.  And I never had a user report the error message they’d see if that failed.

> Docs for PRAGMA optimize; says nothing about busy|locked scenarios, nor any need to call this in a transaction (nor even if that's legal). Ditto ANALYZE docs say nothing about busy|locked scenarios
>
> What's expected if PRAGMA optimize hits a busy|locked scenario?
> What am I the caller expected to do?

The only locking in SQLite is to lock the entire database.  So the initial question is whether you do have some other thread/process accessing the database.

I would ask you to reconsider why you’re doing "PRAGMA optimize" while other parts of your program are accessing the database. Right now it tells you only whether ANALYZE is needed, and there’s no harm in doing ANALYZE when it’s not needed.

It makes more sense just to put ANALYZE into some sort of end-of-month procedure or a manual maintenance procedure your users run only when nobody else is using the system.  That way the routine takes a more predictable amount of time, which users like.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Howard Kapustein
>Don't do that
Mostly sure, but there's some cases SQLite will skip the busyhandler and immediately return, but they're not permanently-busy conditions. We have multiple threads across multiple processes using the database

> why you’re doing "PRAGMA optimize" while other parts of your program are accessing the database
Because we don't necessarily know there's other work in progress, and we don't have a good opportunity to ensure we do it later but-no-later than when we need it

>It makes more sense just to put ANALYZE into some sort of end-of-month procedure
>or a manual maintenance procedure your users run only when nobody else is using the system
That's what we did but the real world isn't that simple

Last month we did...
* ANALYZE when the staterepository service starts (effective boot time) if it hasn't been done in a while (1 or 3 days, I forget)
* ANALYZE at the end of a deployment operation if it hasn't been done in a while (1min I think)
@ 1st login there's 30+ operations in a short window (1-2 minutes?) and folks are very anal^H^H^H^Hconscious re 1st login perf. The workload is rather variable all told thus the 1min delay. We usually wind up hitting this a couple of times @ 1st login. That primes sqlite_stat1 sufficiently that statements execute in ms (at most) as expected. .

We recently found a problem where you click on a hyperlink in Edge which (among other things) queries the database and _sometimes_ takes 30s+ instead of near-zero. But we couldn't repro it, and in the rare case someone hit it the problem went away on its own for no apparent reason either. We finally found the problem was a table involved in the query had 1 record and executing the raw SQL took an abnormally long time - but ANALYZE and re-query and time was near-zero. As if we'd added a row to the table for the first time but didn't do ANALYZE so SQLite had no stats to inform its planning, and of course if you reboot after a day or 3 the 'maintenance' would kick in and update stats. Or if something else happened to get installed on the system causing a new deployment operation to call ANALYZE. But the user experience was poor and too unpredictable...

PRAGMA optimize; fixed this. We get the ANALYZE benefits we need, but only for those tables that grew significantly (and 0 to >0 qualifies), and

We also cache connections for what can be lifetime of a service, so optimize @ connection close is too long to wait.


My quick hack is to change the PRAGMA optimize per deployment operation to a best-effort - remove the busy handler, try it and restore. I suspect I need something more involved but I'm still weighing my options. The big Q is understanding PRAGMA optimize (and ANALYZE) in relation to busy|locked conditions.


What do you think of an option to only analyze tables that have grown from 0 records to >0 e.g. PRAGMA optimize(0x10002) so 2=ANALYZE if beneficial and 0x10000=change the criteria from "increased by 25 times or more since the last" to "increased from 0 records to 1+ record since the last? I've seen problems when we have data w/o analyze but not if we grow from N to >N, thought that could be coincidental where we analyze often enough stats never get too out of proportion to the data. So far the only definitive statement I can make is ANALYZE on a table that went from 0 records to 1+ notably improves queries by orders of magnitude.

        - Howard


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Tuesday, October 3, 2017 5:31 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?

On 4 Oct 2017, at 12:54am, Howard Kapustein <[hidden email]> wrote:

> We use the default busy-handler w/timeout=250ms and call sqlite_exec("PRAGMA optimize;") in a loop until success or non-busy/locked error or some ungodly amount of time elapses (~5min). The logs indicate PRAGMA optimize; fails due to SQLITE_BUSY w/o ever going through the busy-handler, and then we spin trying a few thousand times.

Don’t do that.  It defeats the purpose of the timeout you set.  If you want a longer timeout than 250ms, set that as your timeout.  SQLite’s own retry process is clever.  It uses exponential backoff and can get at the locks at a lower level than your own code.

I’ve used timeouts in my own code of 5 minutes.  And I never had a user report the error message they’d see if that failed.

> Docs for PRAGMA optimize; says nothing about busy|locked scenarios, nor any need to call this in a transaction (nor even if that's legal). Ditto ANALYZE docs say nothing about busy|locked scenarios
>
> What's expected if PRAGMA optimize hits a busy|locked scenario?
> What am I the caller expected to do?

The only locking in SQLite is to lock the entire database.  So the initial question is whether you do have some other thread/process accessing the database.

I would ask you to reconsider why you’re doing "PRAGMA optimize" while other parts of your program are accessing the database. Right now it tells you only whether ANALYZE is needed, and there’s no harm in doing ANALYZE when it’s not needed.

It makes more sense just to put ANALYZE into some sort of end-of-month procedure or a manual maintenance procedure your users run only when nobody else is using the system.  That way the routine takes a more predictable amount of time, which users like.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7Choward.kapustein%40microsoft.com%7Cb1fea679d7db4ab4a6bd08d50abf426e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636426738940629164&sdata=NKUh3nPjsvQ25UMpcEy%2FEfhwWzB%2BuQ02a%2B1NrBR7cAY%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Simon Slavin-3


On 4 Oct 2017, at 2:23am, Howard Kapustein <[hidden email]> wrote:

> What do you think of an option to only analyze tables that have grown from 0 records to >0 e.g. PRAGMA optimize(0x10002) so 2=ANALYZE if beneficial and 0x10000=change the criteria from "increased by 25 times or more since the last" to "increased from 0 records to 1+ record since the last?

ANALYZE builds a table of statistics which tells SQLite which search strategy will be optimal.  The table of statistics does not have to exactly match the data in your tables (in fact that’s very rare).  The big changes noted by ANALYZE are not to do with the number of rows in each table, they are to do with the 'chunkiness' of each indexed column: whether a column has only two values (indoor / outdoor) or thousands of different values (surname).  And 'chunkiness' doesn’t change much once a database is big enough that search-time matters.

It is not expected that you’ll try to run ANALYZE while a database is in use.  It’s intended for offline-maintenance.

So why not work around the whole process ?  On your development system, create a database with plausible data in.  Run ANALYZE.  Then copy all the tables named "sqlite_stat*" into a new database.

You can then copy those tables into the databases for your production system.  They will 'tune' SQLite to pick strategies which work best for a typical dataset.  You never have to run ANALYZE or "PRAGMA optimize" on that system.

Yes, it won’t give absolutely the fastest possible operations on your production system(s).  But they will be very close to it, and you have the advantage of never having to lock the database up with a maintenance procedure.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Paul
In reply to this post by Howard Kapustein

From my personal experience, performance instability of SQLite queries (drastically jumping from milliseconds to seconds and back) that is fixable by running ANALYZE means that your queries are not optimal (some crucial indices are missing). The reason ANALYZE helps is because the info that planner receives helps to mitigate the absence of important indices by adapting the strategy. Say for example, you have a query that involves few tables being a JOIN-ed. There may be the case when SQLite has to scan one of the tables but is does not know which, so it uses some default logic that does not account table size (since SQLite does not know it). As the result, wrong table may be chosen (the largest one).

Why this happens exactly after the first record is inserted? Probably because Query Planner is smart and uses some meta info. Probably SQLite does not allocate a page for empty tables before the first record is inserted. Query Planner may take an advantage of this knowledge and optimizing query accordingly.

All of this is just a speculation, of course. What you definitely should do is run EXPLAIN QUERY PLAN on the sluggish query. Better yet, run EXPLAIN QUERY PLAN on *every* query, just to be sure.

-Paul


> >Don't do that
> Mostly sure, but there's some cases SQLite will skip the busyhandler and immediately return, but they're not permanently-busy conditions. We have multiple threads across multiple processes using the database
>
> > why you’re doing "PRAGMA optimize" while other parts of your program are accessing the database
> Because we don't necessarily know there's other work in progress, and we don't have a good opportunity to ensure we do it later but-no-later than when we need it
>
> >It makes more sense just to put ANALYZE into some sort of end-of-month procedure
> >or a manual maintenance procedure your users run only when nobody else is using the system
> That's what we did but the real world isn't that simple
>
> Last month we did...
> * ANALYZE when the staterepository service starts (effective boot time) if it hasn't been done in a while (1 or 3 days, I forget)
> * ANALYZE at the end of a deployment operation if it hasn't been done in a while (1min I think)
> @ 1st login there's 30+ operations in a short window (1-2 minutes?) and folks are very anal^H^H^H^Hconscious re 1st login perf. The workload is rather variable all told thus the 1min delay. We usually wind up hitting this a couple of times @ 1st login. That primes sqlite_stat1 sufficiently that statements execute in ms (at most) as expected. .
>
> We recently found a problem where you click on a hyperlink in Edge which (among other things) queries the database and _sometimes_ takes 30s+ instead of near-zero. But we couldn't repro it, and in the rare case someone hit it the problem went away on its own for no apparent reason either. We finally found the problem was a table involved in the query had 1 record and executing the raw SQL took an abnormally long time - but ANALYZE and re-query and time was near-zero. As if we'd added a row to the table for the first time but didn't do ANALYZE so SQLite had no stats to inform its planning, and of course if you reboot after a day or 3 the 'maintenance' would kick in and update stats. Or if something else happened to get installed on the system causing a new deployment operation to call ANALYZE. But the user experience was poor and too unpredictable...
>
> PRAGMA optimize; fixed this. We get the ANALYZE benefits we need, but only for those tables that grew significantly (and 0 to >0 qualifies), and
>
> We also cache connections for what can be lifetime of a service, so optimize @ connection close is too long to wait.
>
>
> My quick hack is to change the PRAGMA optimize per deployment operation to a best-effort - remove the busy handler, try it and restore. I suspect I need something more involved but I'm still weighing my options. The big Q is understanding PRAGMA optimize (and ANALYZE) in relation to busy|locked conditions.
>
>
> What do you think of an option to only analyze tables that have grown from 0 records to >0 e.g. PRAGMA optimize(0x10002) so 2=ANALYZE if beneficial and 0x10000=change the criteria from "increased by 25 times or more since the last" to "increased from 0 records to 1+ record since the last? I've seen problems when we have data w/o analyze but not if we grow from N to >N, thought that could be coincidental where we analyze often enough stats never get too out of proportion to the data. So far the only definitive statement I can make is ANALYZE on a table that went from 0 records to 1+ notably improves queries by orders of magnitude.
>  - Howard
>
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
> Sent: Tuesday, October 3, 2017 5:31 PM
> To: SQLite mailing list <> [hidden email]>
> Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?
>
> On 4 Oct 2017, at 12:54am, Howard Kapustein <[hidden email]> wrote:
>
> > We use the default busy-handler w/timeout=250ms and call sqlite_exec("PRAGMA optimize;") in a loop until success or non-busy/locked error or some ungodly amount of time elapses (~5min). The logs indicate PRAGMA optimize; fails due to SQLITE_BUSY w/o ever going through the busy-handler, and then we spin trying a few thousand times.
>
> Don’t do that.  It defeats the purpose of the timeout you set.  If you want a longer timeout than 250ms, set that as your timeout.  SQLite’s own retry process is clever.  It uses exponential backoff and can get at the locks at a lower level than your own code.
>
> I’ve used timeouts in my own code of 5 minutes.  And I never had a user report the error message they’d see if that failed.
>
> > Docs for PRAGMA optimize; says nothing about busy|locked scenarios, nor any need to call this in a transaction (nor even if that's legal). Ditto ANALYZE docs say nothing about busy|locked scenarios
> >
> > What's expected if PRAGMA optimize hits a busy|locked scenario?
> > What am I the caller expected to do?
>
> The only locking in SQLite is to lock the entire database.  So the initial question is whether you do have some other thread/process accessing the database.
>
> I would ask you to reconsider why you’re doing "PRAGMA optimize" while other parts of your program are accessing the database. Right now it tells you only whether ANALYZE is needed, and there’s no harm in doing ANALYZE when it’s not needed.
>
> It makes more sense just to put ANALYZE into some sort of end-of-month procedure or a manual maintenance procedure your users run only when nobody else is using the system.  That way the routine takes a more predictable amount of time, which users like.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7Choward.kapustein%40microsoft.com%7Cb1fea679d7db4ab4a6bd08d50abf426e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636426738940629164&sdata=NKUh3nPjsvQ25UMpcEy%2FEfhwWzB%2BuQ02a%2B1NrBR7cAY%3D&reserved=0
> _______________________________________________
> sqlite-users mailing list
> [hidden email]://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Howard Kapustein
In reply to this post by Simon Slavin-3
>On your development system
This is impractical for our case

>It’s intended for offline-maintenance.
The documentation doesn't say that. In fact the only related comment is

https://sqlite.org/pragma.html#pragma_optimize
...This pragma is usually a no-op or nearly so and is very fast. However if SQLite feels that performing database optimizations (such as running ANALYZE or creating new indexes) will improve the performance of future queries, then some database I/O may be done. Applications that want to limit the amount of work performed can set a timer that will invoke sqlite3_interrupt() if the pragma goes on for too long...

I don't see it taking "too long". I see it not even getting to the point of doing i/o, apparently because it decided not to invoke the busyhandler but rather immediately return failure.

        - Howard

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Tuesday, October 3, 2017 6:40 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?



On 4 Oct 2017, at 2:23am, Howard Kapustein <[hidden email]> wrote:

> What do you think of an option to only analyze tables that have grown from 0 records to >0 e.g. PRAGMA optimize(0x10002) so 2=ANALYZE if beneficial and 0x10000=change the criteria from "increased by 25 times or more since the last" to "increased from 0 records to 1+ record since the last?

ANALYZE builds a table of statistics which tells SQLite which search strategy will be optimal.  The table of statistics does not have to exactly match the data in your tables (in fact that’s very rare).  The big changes noted by ANALYZE are not to do with the number of rows in each table, they are to do with the 'chunkiness' of each indexed column: whether a column has only two values (indoor / outdoor) or thousands of different values (surname).  And 'chunkiness' doesn’t change much once a database is big enough that search-time matters.

It is not expected that you’ll try to run ANALYZE while a database is in use.  It’s intended for offline-maintenance.

So why not work around the whole process ?  On your development system, create a database with plausible data in.  Run ANALYZE.  Then copy all the tables named "sqlite_stat*" into a new database.

You can then copy those tables into the databases for your production system.  They will 'tune' SQLite to pick strategies which work best for a typical dataset.  You never have to run ANALYZE or "PRAGMA optimize" on that system.

Yes, it won’t give absolutely the fastest possible operations on your production system(s).  But they will be very close to it, and you have the advantage of never having to lock the database up with a maintenance procedure.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7Choward.kapustein%40microsoft.com%7C26b587f8ebb3420dbc5508d50ac8ed06%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636426780457125699&sdata=5De2RQH4HM%2F%2BzisjcjwXA42VzKTVymtArf%2BP0ZIOk5g%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Howard Kapustein
In reply to this post by Paul
>From my personal experience, performance instability of SQLite queries (drastically jumping from milliseconds to seconds and back)
I haven't seen that sort of instability. We found some queries were consistently poor until we ran ANALYZE, due to nature of the data, indexes and queries. Altering SQL was insufficient to correct the problem.

I'm familiar with EXPLAIN QUERY PLAN. The query involves 5 tables. EXPLAIN QUERY PLAN shows table scans for 3 of them before ANALYZE vs after there's only 1 table scan (and that's the table with 1 record so using an index wouldn't materially affect perf). This is probably a (the?) reason for the perf delta.


This doesn't answer my question -- logs show PRAGMA optimize; appears to skip the busy-handler and return SQLITE_BUSY. The documentation is silent re busy. In fact the only related comment is

https://sqlite.org/pragma.html#pragma_optimize
...This pragma is usually a no-op or nearly so and is very fast. However if SQLite feels that performing database optimizations (such as running ANALYZE or creating new indexes) will improve the performance of future queries, then some database I/O may be done. Applications that want to limit the amount of work performed can set a timer that will invoke sqlite3_interrupt() if the pragma goes on for too long....

SQLite's not performing too much work. It's performing NO work. Is this intended? How is optimize intended to interact with busy/locked conditions?

FYI ANALYZE docs (https://sqlite.org/lang_analyze.html) are equally silent regarding behavior in the face of locks and busy|locked conditions

I've looked over the source but we're into the deeper end of the complexity pool. I see hints of locks in a few spots but nothing definitive or clear.


How are PRAGMA optimize; and ANALYZE; expected to behave in the face of busy|locked conditions?
How are they expected to behave in relation to the busy-handler?

        - Howard


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Paul
Sent: Tuesday, October 3, 2017 10:34 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?


From my personal experience, performance instability of SQLite queries (drastically jumping from milliseconds to seconds and back) that is fixable by running ANALYZE means that your queries are not optimal (some crucial indices are missing). The reason ANALYZE helps is because the info that planner receives helps to mitigate the absence of important indices by adapting the strategy. Say for example, you have a query that involves few tables being a JOIN-ed. There may be the case when SQLite has to scan one of the tables but is does not know which, so it uses some default logic that does not account table size (since SQLite does not know it). As the result, wrong table may be chosen (the largest one).

Why this happens exactly after the first record is inserted? Probably because Query Planner is smart and uses some meta info. Probably SQLite does not allocate a page for empty tables before the first record is inserted. Query Planner may take an advantage of this knowledge and optimizing query accordingly.

All of this is just a speculation, of course. What you definitely should do is run EXPLAIN QUERY PLAN on the sluggish query. Better yet, run EXPLAIN QUERY PLAN on *every* query, just to be sure.

-Paul


> >Don't do that
> Mostly sure, but there's some cases SQLite will skip the busyhandler
> and immediately return, but they're not permanently-busy conditions.
> We have multiple threads across multiple processes using the database
>
> > why you’re doing "PRAGMA optimize" while other parts of your program
> > are accessing the database
> Because we don't necessarily know there's other work in progress, and
> we don't have a good opportunity to ensure we do it later but-no-later
> than when we need it
>
> >It makes more sense just to put ANALYZE into some sort of
> >end-of-month procedure or a manual maintenance procedure your users
> >run only when nobody else is using the system
> That's what we did but the real world isn't that simple
>
> Last month we did...
> * ANALYZE when the staterepository service starts (effective boot
> time) if it hasn't been done in a while (1 or 3 days, I forget)
> * ANALYZE at the end of a deployment operation if it hasn't been done
> in a while (1min I think) @ 1st login there's 30+ operations in a short window (1-2 minutes?) and folks are very anal^H^H^H^Hconscious re 1st login perf. The workload is rather variable all told thus the 1min delay. We usually wind up hitting this a couple of times @ 1st login. That primes sqlite_stat1 sufficiently that statements execute in ms (at most) as expected. .
>
> We recently found a problem where you click on a hyperlink in Edge which (among other things) queries the database and _sometimes_ takes 30s+ instead of near-zero. But we couldn't repro it, and in the rare case someone hit it the problem went away on its own for no apparent reason either. We finally found the problem was a table involved in the query had 1 record and executing the raw SQL took an abnormally long time - but ANALYZE and re-query and time was near-zero. As if we'd added a row to the table for the first time but didn't do ANALYZE so SQLite had no stats to inform its planning, and of course if you reboot after a day or 3 the 'maintenance' would kick in and update stats. Or if something else happened to get installed on the system causing a new deployment operation to call ANALYZE. But the user experience was poor and too unpredictable...
>
> PRAGMA optimize; fixed this. We get the ANALYZE benefits we need, but
> only for those tables that grew significantly (and 0 to >0 qualifies),
> and
>
> We also cache connections for what can be lifetime of a service, so optimize @ connection close is too long to wait.
>
>
> My quick hack is to change the PRAGMA optimize per deployment operation to a best-effort - remove the busy handler, try it and restore. I suspect I need something more involved but I'm still weighing my options. The big Q is understanding PRAGMA optimize (and ANALYZE) in relation to busy|locked conditions.
>
>
> What do you think of an option to only analyze tables that have grown from 0 records to >0 e.g. PRAGMA optimize(0x10002) so 2=ANALYZE if beneficial and 0x10000=change the criteria from "increased by 25 times or more since the last" to "increased from 0 records to 1+ record since the last? I've seen problems when we have data w/o analyze but not if we grow from N to >N, thought that could be coincidental where we analyze often enough stats never get too out of proportion to the data. So far the only definitive statement I can make is ANALYZE on a table that went from 0 records to 1+ notably improves queries by orders of magnitude.
>  - Howard
>
>
> -----Original Message-----
> From: sqlite-users
> [mailto:[hidden email]] On Behalf Of
> Simon Slavin
> Sent: Tuesday, October 3, 2017 5:31 PM
> To: SQLite mailing list <> [hidden email]>
> Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?
>
> On 4 Oct 2017, at 12:54am, Howard Kapustein <[hidden email]> wrote:
>
> > We use the default busy-handler w/timeout=250ms and call sqlite_exec("PRAGMA optimize;") in a loop until success or non-busy/locked error or some ungodly amount of time elapses (~5min). The logs indicate PRAGMA optimize; fails due to SQLITE_BUSY w/o ever going through the busy-handler, and then we spin trying a few thousand times.
>
> Don’t do that.  It defeats the purpose of the timeout you set.  If you want a longer timeout than 250ms, set that as your timeout.  SQLite’s own retry process is clever.  It uses exponential backoff and can get at the locks at a lower level than your own code.
>
> I’ve used timeouts in my own code of 5 minutes.  And I never had a user report the error message they’d see if that failed.
>
> > Docs for PRAGMA optimize; says nothing about busy|locked scenarios,
> > nor any need to call this in a transaction (nor even if that's
> > legal). Ditto ANALYZE docs say nothing about busy|locked scenarios
> >
> > What's expected if PRAGMA optimize hits a busy|locked scenario?
> > What am I the caller expected to do?
>
> The only locking in SQLite is to lock the entire database.  So the initial question is whether you do have some other thread/process accessing the database.
>
> I would ask you to reconsider why you’re doing "PRAGMA optimize" while other parts of your program are accessing the database. Right now it tells you only whether ANALYZE is needed, and there’s no harm in doing ANALYZE when it’s not needed.
>
> It makes more sense just to put ANALYZE into some sort of end-of-month procedure or a manual maintenance procedure your users run only when nobody else is using the system.  That way the routine takes a more predictable amount of time, which users like.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]://na01.safelinks.protection.
> outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmail
> man%2Flistinfo%2Fsqlite-users&data=02%7C01%7Choward.kapustein%40micros
> oft.com%7Cb1fea679d7db4ab4a6bd08d50abf426e%7C72f988bf86f141af91ab2d7cd
> 011db47%7C1%7C0%7C636426738940629164&sdata=NKUh3nPjsvQ25UMpcEy%2FEfhwW
> zB%2BuQ02a%2B1NrBR7cAY%3D&reserved=0
> _______________________________________________
> sqlite-users mailing list
> [hidden email]://mailinglists.sqlite.org/cgi
> -bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7Choward.kapustein%40microsoft.com%7Cfc1032e85e9c43618e0308d50ae97db0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636426920328758792&sdata=pcG1kwyWbh4lfLj%2B3CQP3Qk8u0pVW9fEgWjQflW6L%2Fw%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Jens Alfke-2
In reply to this post by Howard Kapustein

Simon Slavin wrote:
>
> It is not expected that you’ll try to run ANALYZE while a database is in use.  It’s intended for offline-maintenance.

“Offline maintenance” is for servers 😝 I suspect it’s not relevant to the majority* of use cases for SQLite, like user-facing applications and embedded systems. Personally, I’ve never used SQLite in circumstances where I could predict when the database would be idle, since it could be triggered by either user or network activity.

> So why not work around the whole process ?  On your development system, create a database with plausible data in.  Run ANALYZE.  Then copy all the tables named "sqlite_stat*" into a new database.

That’s an interesting idea. I’ve been assuming that ANALYZE was dependent on the exact contents of the database, but if its results can be canned and applied to new databases, that’s a good optimization.

I’m just a bit wary because this isn’t documented anywhere, and generally one is warned against writing to `sqlite_*` tables. I’d rather have some official blessing of this technique before I consider using it.

—Jens

* I have no idea what the breakdown of SQLite use cases is. (Does anyone?) But I know it’s ubiquitous in desktop apps (at least on MacOS), mobile apps, etc. which is where I’ve used it. And servers are probably using MySQL, Postgres, Couchbase, MongoDB, etc.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Simon Slavin-3


On 4 Oct 2017, at 5:06pm, Jens Alfke <[hidden email]> wrote:

> That’s an interesting idea. I’ve been assuming that ANALYZE was dependent on the exact contents of the database, but if its results can be canned and applied to new databases, that’s a good optimization.

It works and I’ve seen it suggested elsewhere though I’ve never used it.  ANALYZE does not expect the stats tables to match the data.  It’s used to record tables lengths and 'chunkiness' to guide the optimizer.  And even if the stats tables are out of date, the optimizer is so good anyway that the difference it makes may not be great.

Users could do ANALYZE in a maintenance procedure perhaps once a year, or even just during a user-triggered "disk-check" routine, as is done in my PVR.  'chunkiness' of data doesn’t change much over the life of a database.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

David Raymond
In reply to this post by Jens Alfke-2
http://www.sqlite.org/optoverview.html section 6.2 "Manual Control Of Query Plans Using SQLITE_STAT Tables"

SQLite provides the ability for advanced programmers to exercise control over the query plan chosen by the optimizer. One method for doing this is to fudge the ANALYZE results in the sqlite_stat1, sqlite_stat3, and/or sqlite_stat4 tables. That approach is not recommended except for the one scenario described in the next paragraph.

For a program that uses an SQLite database as its application file-format, when a new database instance is first created the ANALYZE command is ineffective because the database contain no data from which to gather statistics. In that case, one could construct a large prototype database containing typical data during development and run the ANALYZE command on this prototype database to gather statistics, then save the prototype statistics as part of the application. After deployment, when the application goes to create a new database file, it can run the ANALYZE command in order to create the statistics tables, then copy the precomputed statistics obtained from the prototype database into these new statistics tables. In that way, statistics from large working data sets can be preloaded into newly created application files.



-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Jens Alfke
Sent: Wednesday, October 04, 2017 12:06 PM
To: SQLite mailing list
Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?


> So why not work around the whole process ?  On your development system, create a database with plausible data in.  Run ANALYZE.  Then copy all the tables named "sqlite_stat*" into a new database.

That’s an interesting idea. I’ve been assuming that ANALYZE was dependent on the exact contents of the database, but if its results can be canned and applied to new databases, that’s a good optimization.

I’m just a bit wary because this isn’t documented anywhere, and generally one is warned against writing to `sqlite_*` tables. I’d rather have some official blessing of this technique before I consider using it.

—Jens

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Richard Hipp-3
That ability to manually pre-populate the sqlite_stat1 table has been
around for ages.  But not many developers use it.  Probably because it
is tedious to do and developers have more important problems to work
on.

A few weeks ago, we got a report of a performance problem from a
support customer.  Looking at their database, we found that they had
created all of the tables, and most but not all of the indexes, added
exactly 1 row to each table, then run ANALYZE.  After that they filled
the database with content and started trying to use it.
Unfortunately, the degenerate sqlite_stat1 content created from the
one-row-per-table sample data seriously confused the query planner.
Rerunning ANALYZE fixed the problem.  They would have been better off
to have never run ANALYZE at all, I think.

That episode demonstrated an important point:  It is hard to know when
to run ANALYZE.  Furthermore, application developers ought not need to
become SQLite experts in order to get SQLite to function well in their
application.  SQLite should just figure out the right thing to do for
itself.  It should "just work".

The PRAGMA optimize command is our effort to move further in the
direction of a "smart" SQLite that always automatically "does the
right thing" with respect to gathering and using database statistics.
Rather than try to teach application developers a bunch of arcane
rules (which are really only heuristics) for when and how to run
ANALYZE, we just ask them to run "PRAGMA optimize" right before
closing the database connection.  The application developer has much
less to worry about.  They don't need to understand ANALYZE, how it
works or why it is important.  The "PRAGMA optimize" command will take
care of running ANALYZE for them, and do so in a way that is minimally
invasive (PRAGMA optimize only rarely runs ANALYZE in practice, and
when it does, it will only ANALYZE individual tables, not the entire
database).

We have been dog-fooding this technique as we are able.  The Fossil
system has been modified
(https://www.fossil-scm.org/fossil/artifact/616c0d?ln=1706) to run
"PRAGMA optimize" after every command and/or web-page visit, and that
has worked out quite well.

But, I'd like to get more real-world feedback on the use of "PRAGMA
optimize" and how well it works (or doesn't work) for a wider spectrum
of developers before I start pushing it as a recommended solution.

On 10/4/17, David Raymond <[hidden email]> wrote:

> http://www.sqlite.org/optoverview.html section 6.2 "Manual Control Of Query
> Plans Using SQLITE_STAT Tables"
>
> SQLite provides the ability for advanced programmers to exercise control
> over the query plan chosen by the optimizer. One method for doing this is to
> fudge the ANALYZE results in the sqlite_stat1, sqlite_stat3, and/or
> sqlite_stat4 tables. That approach is not recommended except for the one
> scenario described in the next paragraph.
>
> For a program that uses an SQLite database as its application file-format,
> when a new database instance is first created the ANALYZE command is
> ineffective because the database contain no data from which to gather
> statistics. In that case, one could construct a large prototype database
> containing typical data during development and run the ANALYZE command on
> this prototype database to gather statistics, then save the prototype
> statistics as part of the application. After deployment, when the
> application goes to create a new database file, it can run the ANALYZE
> command in order to create the statistics tables, then copy the precomputed
> statistics obtained from the prototype database into these new statistics
> tables. In that way, statistics from large working data sets can be
> preloaded into newly created application files.
>
>
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]] On
> Behalf Of Jens Alfke
> Sent: Wednesday, October 04, 2017 12:06 PM
> To: SQLite mailing list
> Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?
>
>
>> So why not work around the whole process ?  On your development system,
>> create a database with plausible data in.  Run ANALYZE.  Then copy all the
>> tables named "sqlite_stat*" into a new database.
>
> That’s an interesting idea. I’ve been assuming that ANALYZE was dependent on
> the exact contents of the database, but if its results can be canned and
> applied to new databases, that’s a good optimization.
>
> I’m just a bit wary because this isn’t documented anywhere, and generally
> one is warned against writing to `sqlite_*` tables. I’d rather have some
> official blessing of this technique before I consider using it.
>
> —Jens
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Stephen Chrzanowski
The databases I make are pretty darned small and insignificant that even
full table scans wouldn't show that much of a hit in performance.  However,
if there were an API or SQL code or something we can add to our code that
would give you meaningful results (other than "Yes, it works") we could
email in, I'd be glad to add it in for debug executions.

On Wed, Oct 4, 2017 at 1:30 PM, Richard Hipp <[hidden email]> wrote:

>
> But, I'd like to get more real-world feedback on the use of "PRAGMA
> optimize" and how well it works (or doesn't work) for a wider spectrum
> of developers before I start pushing it as a recommended solution.
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Jens Alfke-2
In reply to this post by Richard Hipp-3


> On Oct 4, 2017, at 10:30 AM, Richard Hipp <[hidden email]> wrote:
>
> The PRAGMA optimize command is our effort to move further in the
> direction of a "smart" SQLite that always automatically "does the
> right thing" with respect to gathering and using database statistics.

That’s a great move. Along the same lines, it would be nice if SQLite could vacuum once in a while without being asked (like a good housemate ;) What I’m doing now is based on some advice I read in a blog post*:

        // If this fraction of the database is composed of free pages, vacuum it
        static const float kVacuumFractionThreshold = 0.25;
        // If the database has many bytes of free space, vacuum it
        static const int64_t kVacuumSizeThreshold = 50 * MB;

        // After creating a new database:
        exec("PRAGMA auto_vacuum=incremental”);

        // Just before closing a database:
        exec("PRAGMA optimize");
        int64_t pageCount = intQuery("PRAGMA page_count");
        int64_t freePages = intQuery("PRAGMA freelist_count");
        if ((pageCount > 0 && (float)freePages / pageCount >= kVacuumFractionThreshold)
                        || (freePages * kPageSize >= kVacuumSizeThreshold)) {
                exec("PRAGMA incremental_vacuum");
        }

(To forestall any retorts that “you don’t need to vacuum because SQLite will reuse the free space later”: Yes, you do need to, on a space-constrained device like a phone. Otherwise your app never reclaims any storage back to the OS for use by other apps, and you get customer issues like “I deleted all my old junk from the app but it’s still using 10GB of storage, please help my phone is out of space”…)

—Jens

* https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Toby Dickenson
How does PRAGMA optimize or ANALYSE affect concurrency? Is there any
risk that it might leave the database locked for the full duration of
the scan?

On 4 October 2017 at 23:29, Jens Alfke <[hidden email]> wrote:

>
>
>> On Oct 4, 2017, at 10:30 AM, Richard Hipp <[hidden email]> wrote:
>>
>> The PRAGMA optimize command is our effort to move further in the
>> direction of a "smart" SQLite that always automatically "does the
>> right thing" with respect to gathering and using database statistics.
>
> That’s a great move. Along the same lines, it would be nice if SQLite could vacuum once in a while without being asked (like a good housemate ;) What I’m doing now is based on some advice I read in a blog post*:
>
>         // If this fraction of the database is composed of free pages, vacuum it
>         static const float kVacuumFractionThreshold = 0.25;
>         // If the database has many bytes of free space, vacuum it
>         static const int64_t kVacuumSizeThreshold = 50 * MB;
>
>         // After creating a new database:
>         exec("PRAGMA auto_vacuum=incremental”);
>
>         // Just before closing a database:
>         exec("PRAGMA optimize");
>         int64_t pageCount = intQuery("PRAGMA page_count");
>         int64_t freePages = intQuery("PRAGMA freelist_count");
>         if ((pageCount > 0 && (float)freePages / pageCount >= kVacuumFractionThreshold)
>                         || (freePages * kPageSize >= kVacuumSizeThreshold)) {
>                 exec("PRAGMA incremental_vacuum");
>         }
>
> (To forestall any retorts that “you don’t need to vacuum because SQLite will reuse the free space later”: Yes, you do need to, on a space-constrained device like a phone. Otherwise your app never reclaims any storage back to the OS for use by other apps, and you get customer issues like “I deleted all my old junk from the app but it’s still using 10GB of storage, please help my phone is out of space”…)
>
> —Jens
>
> * https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Simon Slavin-3


On 5 Oct 2017, at 12:07am, Toby Dickenson <[hidden email]> wrote:

> How does PRAGMA optimize or ANALYSE affect concurrency? Is there any
> risk that it might leave the database locked for the full duration of
> the scan?

ANALYZE does this every time.  It’s not possible to analyze a database while it’s being changed.

I’d expect "PRAGMA optimize" to do the same but I’m not sure.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Keith Medcalf
In reply to this post by Howard Kapustein
Android sucks badly.  Dont knpw who it is designed for but it sure aint me ...

Anyway, "offline maintenance" is for IT Shops to annoy their customers.  I think everywhere I worked for the past 40 years has been a 25 by 8 by 366 operation.

So scheduling something for "idle hours" meant knowing that evrryone would be at a toga party at the same time for 3 hours next weekend and could I please have exclusive use for that time?

And of course it only works till the first time you exceed the window cuz aftet that you will never get another.





“Offline maintenance” is for servers 😝 I suspect it’s not relevant to the majority* of use cases for SQLite, like user-facing applications and embedded systems. Personally, I’ve never used SQLite in circumstances where I could predict when the database would be idle, since it could be triggered by either user or network activity.

> So why not work around the whole process ?  On your development system, create a database with plausible data in.  Run ANALYZE.  Then copy all the tables named "sqlite_stat*" into a new database.

That’s an interesting idea. I’ve been assuming that ANALYZE was dependent on the exact contents of the database, but if its results can be canned and applied to new databases, that’s a good optimization.

I’m just a bit wary because this isn’t documented anywhere, and generally one is warned against writing to `sqlite_*` tables. I’d rather have some official blessing of this technique before I consider using it.

—Jens

* I have no idea what the breakdown of SQLite use cases is. (Does anyone?) But I know it’s ubiquitous in desktop apps (at least on MacOS), mobile apps, etc. which is where I’ve used it. And servers are probably using MySQL, Postgres, Couchbase, MongoDB, etc.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Toby Dickenson
In reply to this post by Simon Slavin-3
Thats problematic. It means I have to guess how long my analyze might take,
and guess at a suitable time when that scan wont block other clients. Its a
good reason to never call analyze.

Does it have to be so? It seems to me (possibly naively) that the scan
could be performed in one long read transaction, then a separate quick
write transaction could be used to update the stat tables. Theres no need
for the whole operation to be atomic, right?

On 5 October 2017 at 00:37, Simon Slavin <[hidden email]> wrote:

>
>
> On 5 Oct 2017, at 12:07am, Toby Dickenson <[hidden email]> wrote:
>
> > How does PRAGMA optimize or ANALYSE affect concurrency? Is there any
> > risk that it might leave the database locked for the full duration of
> > the scan?
>
> ANALYZE does this every time.  It’s not possible to analyze a database
> while it’s being changed.
>
> I’d expect "PRAGMA optimize" to do the same but I’m not sure.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Darko Volaric
In the case of a database that ran hot all the time a better strategy would be to have an online backup of the database running at all time and then run analyze on that periodically, then update the stats table in the live database when that was finished. I'm not sure what the rules are for updating the stats table for an open database are, but that would be a very useful enhancement if it wasn't very complicated to implement.



> On Oct 6, 2017, at 8:26 PM, Toby Dickenson <[hidden email]> wrote:
>
> Thats problematic. It means I have to guess how long my analyze might take,
> and guess at a suitable time when that scan wont block other clients. Its a
> good reason to never call analyze.
>
> Does it have to be so? It seems to me (possibly naively) that the scan
> could be performed in one long read transaction, then a separate quick
> write transaction could be used to update the stat tables. Theres no need
> for the whole operation to be atomic, right?
>
> On 5 October 2017 at 00:37, Simon Slavin <[hidden email]> wrote:
>
>>
>>
>> On 5 Oct 2017, at 12:07am, Toby Dickenson <[hidden email]> wrote:
>>
>>> How does PRAGMA optimize or ANALYSE affect concurrency? Is there any
>>> risk that it might leave the database locked for the full duration of
>>> the scan?
>>
>> ANALYZE does this every time.  It’s not possible to analyze a database
>> while it’s being changed.
>>
>> I’d expect "PRAGMA optimize" to do the same but I’m not sure.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Roman Fleysher
In reply to this post by Jens Alfke-2
Dear SQLiters,

Vacuuming seems to belong to a different thread, but let me say that it is not always warranted. Vacuuming may change/reassign ROWIDs. If you have two databases (backup and production?) that used to be linked via such a key, it will break.

Roman

________________________________________
From: sqlite-users [[hidden email]] on behalf of Jens Alfke [[hidden email]]
Sent: Wednesday, October 04, 2017 6:29 PM
To: SQLite mailing list
Subject: Re: [sqlite] PRAGMA optimize; == no busy handler?

> On Oct 4, 2017, at 10:30 AM, Richard Hipp <[hidden email]> wrote:
>
> The PRAGMA optimize command is our effort to move further in the
> direction of a "smart" SQLite that always automatically "does the
> right thing" with respect to gathering and using database statistics.

That’s a great move. Along the same lines, it would be nice if SQLite could vacuum once in a while without being asked (like a good housemate ;) What I’m doing now is based on some advice I read in a blog post*:

        // If this fraction of the database is composed of free pages, vacuum it
        static const float kVacuumFractionThreshold = 0.25;
        // If the database has many bytes of free space, vacuum it
        static const int64_t kVacuumSizeThreshold = 50 * MB;

        // After creating a new database:
        exec("PRAGMA auto_vacuum=incremental”);

        // Just before closing a database:
        exec("PRAGMA optimize");
        int64_t pageCount = intQuery("PRAGMA page_count");
        int64_t freePages = intQuery("PRAGMA freelist_count");
        if ((pageCount > 0 && (float)freePages / pageCount >= kVacuumFractionThreshold)
                        || (freePages * kPageSize >= kVacuumSizeThreshold)) {
                exec("PRAGMA incremental_vacuum");
        }

(To forestall any retorts that “you don’t need to vacuum because SQLite will reuse the free space later”: Yes, you do need to, on a space-constrained device like a phone. Otherwise your app never reclaims any storage back to the OS for use by other apps, and you get customer issues like “I deleted all my old junk from the app but it’s still using 10GB of storage, please help my phone is out of space”…)

—Jens

* https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: PRAGMA optimize; == no busy handler?

Jens Alfke-2


> On Oct 6, 2017, at 3:13 PM, Roman Fleysher <[hidden email]> wrote:
>
> Vacuuming seems to belong to a different thread, but let me say that it is not always warranted. Vacuuming may change/reassign ROWIDs.

The docs say "The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY.”

If you don’t declare an explicit primary key, relying on the automatic ‘rowid’ column to provide one sounds like a bad idea to begin with. And it’s easy to work around by declaring a primary key.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12