Pre-allocating disk space to avoid db file fragments

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Pre-allocating disk space to avoid db file fragments

GreatNews
Hi D. Richard Hipp,
 
I'm developing a desktop rss reader using your excellent sqlite engine. One
issue my users found is that sqlite database can get heavily fragmented over
time. I'm wondering if it's a viable suggestion that sqlite pre-allocates
disk space when creating database, and grows the db file by bigger
chunk(e.g. grow by 20% or so in size each time)?
 
Thanks,
 
Jack
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Jay Sprenkle
On 9/13/05, GreatNews <[hidden email]> wrote:

>
> Hi D. Richard Hipp,
>
> I'm developing a desktop rss reader using your excellent sqlite engine.
> One
> issue my users found is that sqlite database can get heavily fragmented
> over
> time. I'm wondering if it's a viable suggestion that sqlite pre-allocates
> disk space when creating database, and grows the db file by bigger
> chunk(e.g. grow by 20% or so in size each time)?



Why not do a vacuum every 10th time (or something similar) you exit the
program?
 
---
The Castles of Dereth Calendar: a tour of the art and architecture of
Asheron's Call
http://www.lulu.com/content/77264
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Cory Nelson
I think his issue is that the database is changing size too often.  He
wants it to automatically expand in larger chunks so there is less
fragmentation on the disk.

Good idea, assuming it's settable via pragma.

On 9/13/05, Jay Sprenkle <[hidden email]> wrote:

> On 9/13/05, GreatNews <[hidden email]> wrote:
> >
> > Hi D. Richard Hipp,
> >
> > I'm developing a desktop rss reader using your excellent sqlite engine.
> > One
> > issue my users found is that sqlite database can get heavily fragmented
> > over
> > time. I'm wondering if it's a viable suggestion that sqlite pre-allocates
> > disk space when creating database, and grows the db file by bigger
> > chunk(e.g. grow by 20% or so in size each time)?
>
>
>
> Why not do a vacuum every 10th time (or something similar) you exit the
> program?
>
> ---
> The Castles of Dereth Calendar: a tour of the art and architecture of
> Asheron's Call
> http://www.lulu.com/content/77264
>
>


--
Cory Nelson
http://www.int64.org
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Dennis Jenkins
In reply to this post by Jay Sprenkle
Jay Sprenkle wrote:

>On 9/13/05, GreatNews <[hidden email]> wrote:
>  
>
>>Hi D. Richard Hipp,
>>
>>I'm developing a desktop rss reader using your excellent sqlite engine.
>>One
>>issue my users found is that sqlite database can get heavily fragmented
>>over
>>time. I'm wondering if it's a viable suggestion that sqlite pre-allocates
>>disk space when creating database, and grows the db file by bigger
>>chunk(e.g. grow by 20% or so in size each time)?
>>    
>>
>
>
>
>Why not do a vacuum every 10th time (or something similar) you exit the
>program?
>
>---
>The Castles of Dereth Calendar: a tour of the art and architecture of
>Asheron's Call
>http://www.lulu.com/content/77264
>
>  
>
Even vacuuming won't defrag the file.  Disk space is allocated by the OS
and the OS makes no guarantees.

If the program is running on Windows, then you can defrag the file your
self (if you have admin rights).  You can read about the Win32 defrag
APIs here http://www.sysinternals.com/Information/DiskDefragmenting.html.

We don't defrag the database file, but I do vacuum it whenever the slack
space exceeds 25% of the total file space used by the database file.  We
check for this when our application starts up.

I ported a hack from sqlite2 to sqlite3 that calculates the amount of
slack space.  I submitted it (probably improperly) for inclusion into
sqlite a few weeks ago.  I can provide it to you if you wish, just email me.

You could do the following:

1) Create a dummy table and fill it with a gazillion [1] rows of junk.
2) Defrag the database file.
3) Drop the dummy table.
4) You just created lots of slack space that will be reused by sqlite
before sqlite extends the disk file (I think).

[1] for suitable values of "a gazillion".

Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Ben Clewett
In reply to this post by Cory Nelson
An old COBOL system we had did this.  It never allocated less than 64
blocks of disk space.  It did work.

A lot of modern file systems (eg, EXT2 and EXT3) do this anyway by
reserving space after your file for later use.  So if you are using a
file system with plenty of free space, file expansion will (mostly) be
as a continuous extension of exiting data.

Apart from file fragmentation, there is also table space fragmentation.
  A sequential read through an index on a table may not be a sequential
read along a disk cylinder.  Therefore resulting in low performance.  I
don't know whether VACUUM helps or hinders this effect.

 From experience I know that dumping an entire DB as SQL, then
destroying database, then parsing back in.  Can result in significant
read performance gains.  Where database is not cached by OS file cache
system.  I would *guess* that where the database is cached, none of this
will make much difference. :)

Just my two pence worth...


Cory Nelson wrote:

> I think his issue is that the database is changing size too often.  He
> wants it to automatically expand in larger chunks so there is less
> fragmentation on the disk.
>
> Good idea, assuming it's settable via pragma.
>
> On 9/13/05, Jay Sprenkle <[hidden email]> wrote:
>
>>On 9/13/05, GreatNews <[hidden email]> wrote:
>>
>>>Hi D. Richard Hipp,
>>>
>>>I'm developing a desktop rss reader using your excellent sqlite engine.
>>>One
>>>issue my users found is that sqlite database can get heavily fragmented
>>>over
>>>time. I'm wondering if it's a viable suggestion that sqlite pre-allocates
>>>disk space when creating database, and grows the db file by bigger
>>>chunk(e.g. grow by 20% or so in size each time)?
>>
>>
>>
>>Why not do a vacuum every 10th time (or something similar) you exit the
>>program?
>>
>>---
>>The Castles of Dereth Calendar: a tour of the art and architecture of
>>Asheron's Call
>>http://www.lulu.com/content/77264
>>
>>
>
>
>

--
Ben Clewett
+44(0)1923 460000
Project Manager
Road Tech Computer Systems Ltd
http://www.roadrunner.uk.com
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Jay Sprenkle
In reply to this post by Dennis Jenkins
On 9/13/05, Dennis Jenkins <[hidden email]> wrote:
>
> >
> Even vacuuming won't defrag the file. Disk space is allocated by the OS
> and the OS makes no guarantees.


Won't Dr. Hipp's method of making a backup copy also defrag the file?

i.e.

execute begin exclusive to lock it.
copy the file
commit
rename the files and use the backup copy as the new current database.

Assuming your disk free space isn't heavily fragmented.
If it is fragmented I believe this will tend to reduce the fragmentation
with time,
depending on what else is going on at the same time on the machine.
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Ray Mosley
In reply to this post by Ben Clewett
Is it overkill to VACUUM every time the last user terminates?

On 9/13/05, Ben Clewett <[hidden email]> wrote:

>
> An old COBOL system we had did this. It never allocated less than 64
> blocks of disk space. It did work.
>
> A lot of modern file systems (eg, EXT2 and EXT3) do this anyway by
> reserving space after your file for later use. So if you are using a
> file system with plenty of free space, file expansion will (mostly) be
> as a continuous extension of exiting data.
>
> Apart from file fragmentation, there is also table space fragmentation.
> A sequential read through an index on a table may not be a sequential
> read along a disk cylinder. Therefore resulting in low performance. I
> don't know whether VACUUM helps or hinders this effect.
>
> From experience I know that dumping an entire DB as SQL, then
> destroying database, then parsing back in. Can result in significant
> read performance gains. Where database is not cached by OS file cache
> system. I would *guess* that where the database is cached, none of this
> will make much difference. :)
>
> Just my two pence worth...
>
>
> Cory Nelson wrote:
> > I think his issue is that the database is changing size too often. He
> > wants it to automatically expand in larger chunks so there is less
> > fragmentation on the disk.
> >
> > Good idea, assuming it's settable via pragma.
> >
> > On 9/13/05, Jay Sprenkle <[hidden email]> wrote:
> >
> >>On 9/13/05, GreatNews <[hidden email]> wrote:
> >>
> >>>Hi D. Richard Hipp,
> >>>
> >>>I'm developing a desktop rss reader using your excellent sqlite engine.
> >>>One
> >>>issue my users found is that sqlite database can get heavily fragmented
> >>>over
> >>>time. I'm wondering if it's a viable suggestion that sqlite
> pre-allocates
> >>>disk space when creating database, and grows the db file by bigger
> >>>chunk(e.g. grow by 20% or so in size each time)?
> >>
> >>
> >>
> >>Why not do a vacuum every 10th time (or something similar) you exit the
> >>program?
> >>
> >>---
> >>The Castles of Dereth Calendar: a tour of the art and architecture of
> >>Asheron's Call
> >>http://www.lulu.com/content/77264
> >>
> >>
> >
> >
> >
>
>
> --
> Ben Clewett
> +44(0)1923 460000
> Project Manager
> Road Tech Computer Systems Ltd
> http://www.roadrunner.uk.com
>
>


--
Ray Mosley
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Dennis Jenkins
In reply to this post by Jay Sprenkle
Jay Sprenkle wrote:

>On 9/13/05, Dennis Jenkins <[hidden email]> wrote:
>  
>
>>Even vacuuming won't defrag the file. Disk space is allocated by the OS
>>and the OS makes no guarantees.
>>    
>>
>
>
>Won't Dr. Hipp's method of making a backup copy also defrag the file?
>
>i.e.
>
>execute begin exclusive to lock it.
>copy the file
>commit
>rename the files and use the backup copy as the new current database.
>
>Assuming your disk free space isn't heavily fragmented.
>If it is fragmented I believe this will tend to reduce the fragmentation
>with time,
>depending on what else is going on at the same time on the machine.
>  
>
It depends on lots of things: the OS, the filesystem, the % free space
on the file system, other processes that are causing the OS to allocate
disk blocks.  I have noticed that Windows XP totally sucks at keeping
files fragment free when copying them.  Even if there is enough free
space to hold the destination file contiguously, the OS won't do it.  I
have rarely bothered to check file fragmentation on Linux and FreeBSD
systems, so I don't know how those handle it (but I would assume it to
be much more intelligent than NTFS).

To Ben's point, I neglected to consider table space fragmentation.  He
has a very good point.  I read the source code to the VACUUM function.  
My understanding is that the resulting file won't have any table space
fragmentation, but I could be wrong.

Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Jay Sprenkle
In reply to this post by Ray Mosley
On 9/13/05, Ray Mosley <[hidden email]> wrote:
>
> Is it overkill to VACUUM every time the last user terminates?


It depends.
If your program is very active rearranging database records every time it
runs probably not.
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Jay Sprenkle
In reply to this post by Dennis Jenkins
On 9/13/05, Dennis Jenkins <[hidden email]> wrote:

>
> >
> It depends on lots of things: the OS, the filesystem, the % free space
> on the file system, other processes that are causing the OS to allocate
> disk blocks. I have noticed that Windows XP totally sucks at keeping
> files fragment free when copying them. Even if there is enough free
> space to hold the destination file contiguously, the OS won't do it. I
> have rarely bothered to check file fragmentation on Linux and FreeBSD
> systems, so I don't know how those handle it (but I would assume it to
> be much more intelligent than NTFS).


ugh! Thanks for letting us know about that.

There's no way I know of to control fragmentation.
I've been assuming if you copy a complete file within a short time period
to a new location it will likely be less fragmented that the original. It's
not
always true, but in my experience it's simple and generally tends to be
true over the long run. If a user will not do defrag on their disk there's
not
a lot you can do to correct for it.
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Ben Clewett
In reply to this post by Jay Sprenkle
A small warning with running VACUUM too often.  Any predefined
statements will fail if they are defined before the VACUUM and used
afterwards.  I had a daemon which did a VACUUM autonomously.  Which
occasionally coincided with a user request, and broke it :)

Jay Sprenkle wrote:

> On 9/13/05, Ray Mosley <[hidden email]> wrote:
>
>>Is it overkill to VACUUM every time the last user terminates?
>
>
>
> It depends.
> If your program is very active rearranging database records every time it
> runs probably not.
>

--
Ben Clewett
+44(0)1923 460000
Project Manager
Road Tech Computer Systems Ltd
http://www.roadrunner.uk.com
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Dennis Jenkins
In reply to this post by Jay Sprenkle
Jay Sprenkle wrote:

>On 9/13/05, Dennis Jenkins <[hidden email]> wrote:
>  
>
>>It depends on lots of things: the OS, the filesystem, the % free space
>>on the file system, other processes that are causing the OS to allocate
>>disk blocks. I have noticed that Windows XP totally sucks at keeping
>>files fragment free when copying them. Even if there is enough free
>>space to hold the destination file contiguously, the OS won't do it. I
>>have rarely bothered to check file fragmentation on Linux and FreeBSD
>>systems, so I don't know how those handle it (but I would assume it to
>>be much more intelligent than NTFS).
>>    
>>
>
>
>ugh! Thanks for letting us know about that.
>
>There's no way I know of to control fragmentation.
>I've been assuming if you copy a complete file within a short time period
>to a new location it will likely be less fragmented that the original. It's
>not
>always true, but in my experience it's simple and generally tends to be
>true over the long run. If a user will not do defrag on their disk there's
>not
>a lot you can do to correct for it.
>
>  
>
Actually, you can defrag the database file yourself, if you have admin
rights (b/c you need to open a handle to the physical device).

Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Jay Sprenkle
On 9/13/05, Dennis Jenkins <[hidden email]> wrote:
>
> Actually, you can defrag the database file yourself, if you have admin
> rights (b/c you need to open a handle to the physical device).
>
>
I thought he needed an automated solution to include in his code released to
users.

--
---
The Castles of Dereth Calendar: a tour of the art and architecture of
Asheron's Call
http://www.lulu.com/content/77264
Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Dennis Jenkins
Jay Sprenkle wrote:

>On 9/13/05, Dennis Jenkins <[hidden email]> wrote:
>  
>
>>Actually, you can defrag the database file yourself, if you have admin
>>rights (b/c you need to open a handle to the physical device).
>>
>>
>>    
>>
>I thought he needed an automated solution to include in his code released to
>users.
>
>  
>
Yeah.  His code can defrag the file if it has the correct permissions
and is running on Windows NT 4 or better.  He can do this whenever he
has the need to.  Am I missing something?  Maybe I missed the original
goal and only focused on the "need to defrag" angle.

Reply | Threaded
Open this post in threaded view
|

Re: Pre-allocating disk space to avoid db file fragments

Jay Sprenkle
On 9/13/05, Dennis Jenkins <[hidden email]> wrote:

>
> Jay Sprenkle wrote:
>
> >I thought he needed an automated solution to include in his code released
> to
> >users.
> >
> Yeah. His code can defrag the file if it has the correct permissions
> and is running on Windows NT 4 or better. He can do this whenever he
> has the need to. Am I missing something? Maybe I missed the original
> goal and only focused on the "need to defrag" angle.
>
> Oh, sorry. My mistake. I thought you were suggesting he manually defrag it
and
he wanted an automated solution. I didn't know they'd put that functionality
into
an API. Learn something every day!


--
---
The Castles of Dereth Calendar: a tour of the art and architecture of
Asheron's Call
http://www.lulu.com/content/77264
Reply | Threaded
Open this post in threaded view
|

RE: Pre-allocating disk space to avoid db file fragments

GreatNews
In reply to this post by Dennis Jenkins


This problem can be remedied by defragging the hard disk from time to time.
Copying the whole db file might also do. But I'm thinking if it's possible
to prevent this problem from happening, or reduce the chances of getting
fragmented? Sqlite can use free pages that were originally occupied by
deleted records. So it might be able to implement a command to pre-allocate
some pages in a big chunk?  Requesting a big chunk of hard disk usually get
less fragments.

Thanks,

Jack


-----Original Message-----
From: Dennis Jenkins [mailto:[hidden email]]
Sent: Tuesday, September 13, 2005 11:39 AM
To: [hidden email]
Subject: Re: [sqlite] Pre-allocating disk space to avoid db file fragments

Jay Sprenkle wrote:

>On 9/13/05, Dennis Jenkins <[hidden email]> wrote:
>  
>
>>Actually, you can defrag the database file yourself, if you have admin
>>rights (b/c you need to open a handle to the physical device).
>>
>>
>>    
>>
>I thought he needed an automated solution to include in his code
>released to users.
>
>  
>
Yeah.  His code can defrag the file if it has the correct permissions and is
running on Windows NT 4 or better.  He can do this whenever he has the need
to.  Am I missing something?  Maybe I missed the original goal and only
focused on the "need to defrag" angle.