Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

Ligesh

 I am running a very simple sql query with only one instance. The query is something like:

 select * from table where parent_name = 'parent' order by name limit 10

 The query takes around .3 seconds when the total number of matches are around 7,000, but just jumps to 11 seconds, at 13,000 matches, and there is some heavy swap activity too at this point, and the whole machine's responsiveness goes down once this query is run.

 The table is indexed with parent_name, so I think it shouldn't even have to load the whole table into the memory. The row size isn't huge too with only around 1KB per row.

 Is this some known problem? Can anyone shed some light on this?

 Thanks in advance.


- Ligesh
Reply | Threaded
Open this post in threaded view
|

Re: Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

Ulrik Sandborg-Petersen
Ligesh,

Ligesh wrote:

> I am running a very simple sql query with only one instance. The query is something like:
>
> select * from table where parent_name = 'parent' order by name limit 10
>  
>

My guess (and it is only a guess) is that, because of the ORDER BY
clause, SQLite is loading all rows into memory, then sorting them
in-memory.

Here's a tip which might actually help you: If your rows are around 1kb
in size, it might be worth increasing the page size to, say, 4096, so as
not to cause spillovers onto other pages.

You don't say what platform you are running on. At least for me, having
a page size of 4096 increases performance on Win32, probably because it
matches the size of the Win32 memory cache page size. You will find the
relevant #define in pager.h. There, it says that increasing the page
size will not really give you a performance edge. This is true on Linux,
not on Windows, I've found. However, increasing the page size beyond
4096 does nothing for me in terms of increased performance.

Cheers,

Ulrik P.

--
Ulrik Petersen, Denmark


Reply | Threaded
Open this post in threaded view
|

Re: Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

D. Richard Hipp
In reply to this post by Ligesh
On Sun, 2005-08-21 at 03:21 +0530, Ligesh wrote:

>  I am running a very simple sql query with only one instance. The
> query is something like:
>
>  select * from table where parent_name = 'parent' order by name limit
> 10
>
>  The query takes around .3 seconds when the total number of matches
> are around 7,000, but just jumps to 11 seconds, at 13,000 matches, and
> there is some heavy swap activity too at this point, and the whole
> machine's responsiveness goes down once this query is run.
>
>  The table is indexed with parent_name, so I think it shouldn't even
> have to load the whole table into the memory. The row size isn't huge
> too with only around 1KB per row.
>

In order to implement the ORDER BY clause, SQLite reads the
entire result set into memory and sorts it there.  When your
result set gets very large (13000 rows) and each row uses in
excess of 1KB or memory, this is apparently causing your
machine to thrash.
--
D. Richard Hipp <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

Ligesh
In reply to this post by Ulrik Sandborg-Petersen
On Sun, Aug 21, 2005 at 12:30:25AM +0200, Ulrik Petersen wrote:

> Ligesh,
>
> Ligesh wrote:
>
> >I am running a very simple sql query with only one instance. The query is
> >something like:
> >
> >select * from table where parent_name = 'parent' order by name limit 10
> >
> >
>
> My guess (and it is only a guess) is that, because of the ORDER BY
> clause, SQLite is loading all rows into memory, then sorting them
> in-memory.
>

 Yup your guess is right, as Richard has already explained.

 I am running on linux, and I will need this to scale to at least 200,000 rows (Ram will be higher naturally, but it should scale logarithmically). As usual, I expect the database to do the dirty work and return to me only the 10-20 rows that will be shown to the user. My processing is very inefficient, since I am abstracting out the database almost entirely. The database rows are automatically converted to classes which have the same name as the table, and some of the columns are kept serialized. These are automatically unserialized when this particular variable is requested. (i am doing this via the __get overloading ... All in all, a very complex and inefficient setup at my end).

 But the overheads on my end are not supposed to be an issue since I will only be workin on max 20 objects at a one particular time, so I just went for transparent handling of database rather than efficiency.

 I guess, I will have to go for another database. Since the database code is only around 30 lines, out of 80,000 lines of total code, (though the application is 100% database based one) I think that is the easiest method here.

 Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

Ligesh
In reply to this post by D. Richard Hipp
On Sat, Aug 20, 2005 at 06:36:19PM -0400, D. Richard Hipp wrote:

> On Sun, 2005-08-21 at 03:21 +0530, Ligesh wrote:
> >  I am running a very simple sql query with only one instance. The
> > query is something like:
> >
> >
>
> In order to implement the ORDER BY clause, SQLite reads the
> entire result set into memory and sorts it there.  When your
> result set gets very large (13000 rows) and each row uses in
> excess of 1KB or memory, this is apparently causing your
> machine to thrash.
> --

 Thanks, so I would like to know if all sql databases do this. I am not a database expert per say, and that is one of the reasons why the row size is a bit large. Two of the columns are kept serialized, (they are unserialized only at the exact point when they are needed, so as such they don't introduce any overhead other than increasing the row size.

 Does sqlite 3 do it in the proper way. That is, just load the the sort column + a pointer-to-the-row to the ram. Sort it there, and return the result?

  Actually the database structure is a full hierarchical tree, and this is the ONLY query I execute. That is, just get the children of a particular parent.

 Thanks a lot.

 - Ligesh



Reply | Threaded
Open this post in threaded view
|

Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

Ligesh
In reply to this post by D. Richard Hipp
On Sat, Aug 20, 2005 at 06:36:19PM -0400, D. Richard Hipp wrote:

> On Sun, 2005-08-21 at 03:21 +0530, Ligesh wrote:
> >  I am running a very simple sql query with only one instance. The
> > query is something like:
> >
> >  select * from table where parent_name = 'parent' order by name limit
> > 10
> >
> >  The query takes around .3 seconds when the total number of matches
> > are around 7,000, but just jumps to 11 seconds, at 13,000 matches, and
> > there is some heavy swap activity too at this point, and the whole
> > machine's responsiveness goes down once this query is run.
> >
> >  The table is indexed with parent_name, so I think it shouldn't even
> > have to load the whole table into the memory. The row size isn't huge
> > too with only around 1KB per row.
> >
>
> In order to implement the ORDER BY clause, SQLite reads the
> entire result set into memory and sorts it there.  When your
> result set gets very large (13000 rows) and each row uses in
> excess of 1KB or memory, this is apparently causing your
> machine to thrash.
> --

 As an aside, I had thought that the row size would not be much relevant to database speed, especially the size of those columns that are not used in either 'where' or in sorting. So is this true for a standard sql database? Or should I start moving the serialized columns to another table?

 
 The logic would be to store the serialized values in a separate table. When the main rows are loaded, tranparently fill the serialized values at the database driver level itself. So it can be done without touching the top level logic, but by just adding more intelligence to my database layer.

 Thanks.

--
:: Ligesh :: http://ligesh.com 


Reply | Threaded
Open this post in threaded view
|

Re: Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

D. Richard Hipp
In reply to this post by Ligesh
On Sun, 2005-08-21 at 04:24 +0530, Ligesh wrote:
> On Sun, Aug 21, 2005 at 12:30:25AM +0200, Ulrik Petersen wrote:
> > >
> > >select * from table where parent_name = 'parent' order by name limit 10
> > >

>  I am running on linux, and I will need this to scale to at least
> 200,000 rows

If you upgrade to the vary latest code in CVS (version 3.2.2+)
and you create an index like this:

   CREATE INDEX idx ON table(parent_name, name);

Then the query above should be very efficient.

You can try it creating the index above with version 2.x.  It might
work.  I do not remember how smart the optimizer was about using
indexes to optimize sorting in version 2.x (that was so long ago.)

If you really need to use 2.8, you could just create the index
above, make sure that it is the *only* index on the table, then
omit the ORDER BY clause all together.  When the index above is
used, things will come out sorted or by name just because of
the way they work in SQLite.  SQL does not guarantee that behavior
so it probably will not work that way on other database engines,
but it should work fine in SQLite.
--
D. Richard Hipp <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

Khamis Abuelkomboz

D. Richard Hipp wrote:

>
>> I am running on linux, and I will need this to scale to at least
>>200,000 rows
>>    
>>
>
>If you upgrade to the vary latest code in CVS (version 3.2.2+)
>and you create an index like this:
>
>   CREATE INDEX idx ON table(parent_name, name);
>
>Then the query above should be very efficient.
>
>You can try it creating the index above with version 2.x.  It might
>work.  I do not remember how smart the optimizer was about using
>indexes to optimize sorting in version 2.x (that was so long ago.)
>
>If you really need to use 2.8, you could just create the index
>above, make sure that it is the *only* index on the table, then
>omit the ORDER BY clause all together.  When the index above is
>used, things will come out sorted or by name just because of
>the way they work in SQLite.  SQL does not guarantee that behavior
>so it probably will not work that way on other database engines,
>but it should work fine in SQLite.
>  
>
I'm actually experiencing similar problem with sqlite3.0, when tables
are empty, select and modify operations are very fast, but when the
table gets more data the speed actually breaks suddenly down. My
experience, it doesn't matter, witch page size I use. Latest I tried
with page number size of 32000 and page size of 8192, but got the same
results. Tempspace is set to memory.

To clarify what I mean, in my application the parsers scan files and
produce cross-reference entries using inserts in two tables. In a
project with about 3000 files, the first 500 files are done in about
15sec. the latest 500 files take about 90sec.

Is this issue fixed in 3.3.2+ too? Does anyone have any idea, what could
be wrong here?

thanks
khamis

--
Try Code-Navigator on http://www.codenav.com
a source code navigating, analysis and developing tool.
It supports following languages:
    * C/C++
    * Java
    * .NET (including CSharp, VB.Net and other .NET components)
    * Classic Visual Basic
    * PHP, HTML, XML, ASP, CSS
    * Tcl/Tk,
    * Perl
    * Python
    * SQL,
    * m4 Preprocessor
    * Cobol