Optimising Large Tables by Splitting 1:1

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimising Large Tables by Splitting 1:1

James Austin-4
I've been doing a fair bit of research into optimising larger tables and
just want a bit of feedback (most of my lecturers are giving conflicting
answers).

Just say I have a table with 200+ fields (all relate uniquly to the primary
key), when querying the first 10 fields of a table I get just as fast a
response (about the same as if I were querying a table with the same number
of rows but only 10 fields) however as I get further into the table (i.e.
selecting fields 180..190 there seems to be a noticable decrease in speed
(by noticable I mean around 750 benchmark queries per second slower)).

As a measure to increase the speed I was thinking that a viable option would
be to break up the table based on how it is queried, such that sets of
fields will be divided into new tables that contain fields relating to the
query (so long as any fields that may be used in multiple queries (and
require rejoining later) are not separated as this is much slower than
keeping all fields in one table to begin with).

The main question is, is the method listed above the best way to improve the
speed of a large table or should they all remain in the same table as
splitting may cause other problems later on.

One method suggested by one of my lectures was to leave the table as one and
use views, however after testing this out I found views slower than querying
the table large table directly (and seeing that they don't increase the
table size it leads me to believe that a view is simply an alias to a
query).

Any feedback is appreciated,
James Austin.


Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

Roger Binns
> (by noticable I mean around 750 benchmark queries per second slower)).

Since you only give the change, not the total, that is hard to put into
context.  eg if the total is 750,000 then it is within the margin of error.

> The main question is, is the method listed above the best way to improve the
> speed of a large table or should they all remain in the same table as
> splitting may cause other problems later on.

My guess is a far simpler cause.  Each row is stored with the fields
sequentially in the database.  So the primary key and the second field
will be stored in the same page, but the primary key and the 190'th
field are likely to be on different pages (depending on field sizes).
That means twice as much I/O.

I'd suggest looking into the page size pragmas as well as the cache
size pragmas.  If I am right, increasing them should restore your
performance (mostly).

Roger
Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

Bugzilla from jarl@softace.dk
"Roger Binns" <[hidden email]> writes:

>> (by noticable I mean around 750 benchmark queries per second slower)).
>
> Since you only give the change, not the total, that is hard to put into
> context.  eg if the total is 750,000 then it is within the margin of error.
>
>> The main question is, is the method listed above the best way to
>> improve the speed of a large table or should they all remain in the
>> same table as splitting may cause other problems later on.
>
> My guess is a far simpler cause.  Each row is stored with the fields
> sequentially in the database.  So the primary key and the second field
> will be stored in the same page, but the primary key and the 190'th
> field are likely to be on different pages (depending on field sizes).
> That means twice as much I/O.

Could it be because of (taken from http://www.sqlite.org/faq.html#q10):
----------------------------------------------------------------------
Only the first 31 columns of a table are candidates for certain
optimizations. You can put as many columns in an index as you like but
indexes with more than 30 columns will not be used to optimize
queries.
----------------------------------------------------------------------

Jarl

--
Jarl Friis
Softace ApS
Omøgade 8, 2.sal
2100 København Ø.
Denmark
Phone:  +45 26 13 20 90
E-mail: [hidden email]
LinkedIn: https://www.linkedin.com/in/jarlfriis

Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

James Austin-4
In reply to this post by James Austin-4
The question is more to do with theoretical optimisation rather than
improving the speed of that specific database (which was used only for
benchmarks), also note that the 700 record difference was based on around
5000 queries per second (definitly not a margin of error (this is a problem
with all tables that have a large number of fields, reguardless of the
database or software)).

In theory lets say I have hundreds of databases running on the same server
hence I want to optimise the methods in which the database is created and
queries are performed (keep in mind that that the 14% increase in processing
does eventually add up).

Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which
require joining.

The problem here is that a join is one of the slowest queries, hence these
1:1's must be combined.

2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not
need to be joined)

Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1
relationships you remove joins but create tables with a large number of
fields.. which itself has the problems of seeking to fields in the tail end.

What I am asking is more to do with design procedure rather than the
database system itself, is the solution I have shown for 1:1 the correct
method for solving the seek problems or does the solution itself present
unseen problems.

James Austin


Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

James Austin-4
In reply to this post by James Austin-4
The question is more to do with theoretical optimisation rather than
improving the speed of that specific database (which was used only for
benchmarks), also note that the 700 record difference was based on around
5000 queries per second (definitly not a margin of error (this is a problem
with all tables that have a large number of fields, reguardless of the
database or software)).

In theory lets say I have hundreds of databases running on the same server
hence I want to optimise the methods in which the database is created and
queries are performed (keep in mind that that the 14% increase in processing
does eventually add up).

Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which
require joining.

The problem here is that a join is one of the slowest queries, hence these
1:1's must be combined.

2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not
need to be joined)

Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1
relationships you remove joins but create tables with a large number of
fields.. which itself has the problems of seeking to fields in the tail end.

What I am asking is more to do with design procedure rather than the
database system itself, is the solution I have shown for 1:1 the correct
method for solving the seek problems or does the solution itself present
unseen problems.

James Austin


Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

James Austin-4
In reply to this post by James Austin-4
The question is more to do with theoretical optimisation rather than
improving the speed of that specific database (which was used only for
benchmarks), also note that the 700 record difference was based on around
5000 queries per second (definitly not a margin of error (this is a problem
with all tables that have a large number of fields, reguardless of the
database or software)).

In theory lets say I have hundreds of databases running on the same server
hence I want to optimise the methods in which the database is created and
queries are performed (keep in mind that that the 14% increase in processing
does eventually add up).

Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which
require joining.

The problem here is that a join is one of the slowest queries, hence these
1:1's must be combined.

2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not
need to be joined)

Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1
relationships you remove joins but create tables with a large number of
fields.. which itself has the problems of seeking to fields in the tail end.

What I am asking is more to do with design procedure rather than the
database system itself, is the solution I have shown for 1:1 the correct
method for solving the seek problems or does the solution itself present
unseen problems.

James Austin


Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

James Austin-4
In reply to this post by James Austin-4
The question is more to do with theoretical optimisation rather than
improving the speed of that specific database (which was used only for
benchmarks), also note that the 700 record difference was based on around
5000 queries per second (definitly not a margin of error (this is a problem
with all tables that have a large number of fields, reguardless of the
database or software)).

In theory lets say I have hundreds of databases running on the same server
hence I want to optimise the methods in which the database is created and
queries are performed (keep in mind that that the 14% increase in processing
does eventually add up).

Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which
require joining.

The problem here is that a join is one of the slowest queries, hence these
1:1's must be combined.

2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not
need to be joined)

Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1
relationships you remove joins but create tables with a large number of
fields.. which itself has the problems of seeking to fields in the tail end.

What I am asking is more to do with design procedure rather than the
database system itself, is the solution I have shown for 1:1 the correct
method for solving the seek problems or does the solution itself present
unseen problems.

James Austin


Reply | Threaded
Open this post in threaded view
|

terrible behavior

emilia12
In reply to this post by Roger Binns
hi guys
i need help :-)

what is the problem :
(SQLite version 3.3.4, win XP)

i have a table:

CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid
INTEGER);
INSERT INTO plan VALUES(1,1,0);
INSERT INTO plan VALUES(2,1,0);
INSERT INTO plan VALUES(3,1,0);
INSERT INTO plan VALUES(4,1,NULL);
INSERT INTO plan VALUES(5,1,NULL);
INSERT INTO plan VALUES(6,1,NULL);
INSERT INTO plan VALUES(7,1,NULL);
INSERT INTO plan VALUES(8,1,NULL);
-- ...

and i need to select all rows with "statid" = 0
so the query is :
SELECT statid
FROM plan P1
WHERE P1.curseid = 0;

BUT what a surprise, i got an empty result (i expect to get
rows No 1 to 3) !

any suggestions?

regards
Emily




-----------------------------

“БЯЛА КНИГА на българските телекомуникации, услуги и технологии”
Регистрирай се и я изтегли БЕЗПЛАТНО!
www.jnn-marketing.com/WPTelecom.htm

Reply | Threaded
Open this post in threaded view
|

Re: terrible behavior

Guillaume MAISON
[hidden email] a écrit :

> hi guys
> i need help :-)
>
> what is the problem :
> (SQLite version 3.3.4, win XP)
>
> i have a table:
>
> CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid
> INTEGER);
> INSERT INTO plan VALUES(1,1,0);
> INSERT INTO plan VALUES(2,1,0);
> INSERT INTO plan VALUES(3,1,0);
> INSERT INTO plan VALUES(4,1,NULL);
> INSERT INTO plan VALUES(5,1,NULL);
> INSERT INTO plan VALUES(6,1,NULL);
> INSERT INTO plan VALUES(7,1,NULL);
> INSERT INTO plan VALUES(8,1,NULL);
> -- ...
>
> and i need to select all rows with "statid" = 0
> so the query is :
> SELECT statid
> FROM plan P1
> WHERE P1.curseid = 0;
>
> BUT what a surprise, i got an empty result (i expect to get
> rows No 1 to 3) !
>
> any suggestions?

Yes :
SELECT P1.statid
FROM plan P1
WHERE P1.statid = 0;

instead of your query...

but you'll get only what you asked for : only O ;)


--

Guillaume MAISON - N@uteus
83, Cours Victor Hugo
47000 AGEN
Tél : 05 53 87 91 48 - Fax : 05 53 68 73 50
e-mail : [hidden email] - Web : http://nauteus.com

Reply | Threaded
Open this post in threaded view
|

Re: terrible behavior

emilia12
thanks

actualy the problem was in the sqlite explorer because the
column was defined as integer and any text is interpreted
as zero :-)


Цитат на писмо от Guillaume MAISON <[hidden email]>:

> [hidden email] a ?crit :
> > hi guys
> > i need help :-)
> >
> > what is the problem :
> > (SQLite version 3.3.4, win XP)
> >
> > i have a table:
> >
> > CREATE TABLE plan(personid INTEGER, curseid INTEGER,
> statid
> > INTEGER);
> > INSERT INTO plan VALUES(1,1,0);
> > INSERT INTO plan VALUES(2,1,0);
> > INSERT INTO plan VALUES(3,1,0);
> > INSERT INTO plan VALUES(4,1,NULL);
> > INSERT INTO plan VALUES(5,1,NULL);
> > INSERT INTO plan VALUES(6,1,NULL);
> > INSERT INTO plan VALUES(7,1,NULL);
> > INSERT INTO plan VALUES(8,1,NULL);
> > -- ...
> >
> > and i need to select all rows with "statid" = 0
> > so the query is :
> > SELECT statid
> > FROM plan P1
> > WHERE P1.curseid = 0;
> >
> > BUT what a surprise, i got an empty result (i expect to
> get
> > rows No 1 to 3) !
> >
> > any suggestions?
>
> Yes :
> SELECT P1.statid
> FROM plan P1
> WHERE P1.statid = 0;
>
> instead of your query...
>
> but you'll get only what you asked for : only O ;)
>
>
> --
>
> Guillaume MAISON - N@uteus
> 83, Cours Victor Hugo
> 47000 AGEN
> T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50
> e-mail : [hidden email] - Web : http://nauteus.com
>
>
>




-----------------------------

Slon.bg ™
Симпатичният магазин за
книги, DVD, игри и музика
http://www.slon.bg


Reply | Threaded
Open this post in threaded view
|

Re: terrible behavior

jreidthompson
[hidden email] wrote:
> thanks
>
> actualy the problem was in the sqlite explorer because the
> column was defined as integer and any text is interpreted
> as zero :-)
>
>  
no, actually i think Guillaume was correct.
This is the query you posted:

SELECT statid
FROM plan P1
WHERE P1.curseid = 0;

Note in your data that NO curseid is equal to 0.  They are all set to 1.


>> [hidden email] a ?crit :
>>    
>>> hi guys
>>> i need help :-)
>>>
>>> what is the problem :
>>> (SQLite version 3.3.4, win XP)
>>>
>>> i have a table:
>>>
>>> CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid INTEGER);
>>> INSERT INTO plan VALUES(1,1,0);
>>> INSERT INTO plan VALUES(2,1,0);
>>> INSERT INTO plan VALUES(3,1,0);
>>> INSERT INTO plan VALUES(4,1,NULL);
>>> INSERT INTO plan VALUES(5,1,NULL);
>>> INSERT INTO plan VALUES(6,1,NULL);
>>> INSERT INTO plan VALUES(7,1,NULL);
>>> INSERT INTO plan VALUES(8,1,NULL);
>>> -- ...
>>>
>>> and i need to select all rows with "statid" = 0
>>> so the query is :
>>> SELECT statid
>>> FROM plan P1
>>> WHERE P1.curseid = 0;
>>>
>>> BUT what a surprise, i got an empty result (i expect to
>>>      
>> get
>>    
>>> rows No 1 to 3) !
>>>
>>> any suggestions?
>>>      
>> Yes :
>> SELECT P1.statid
>> FROM plan P1
>> WHERE P1.statid = 0;
>>
>> instead of your query...
>>
>> but you'll get only what you asked for : only O ;)
>>
>>
>> --
>>
>> Guillaume MAISON - N@uteus
>> 83, Cours Victor Hugo
>> 47000 AGEN
>> T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50
>> e-mail : [hidden email] - Web : http://nauteus.com
>>
>>
>>
>>    
>
>
>
>
> -----------------------------
>
> Slon.bg ™
> Симпатичният магазин за
> книги, DVD, игри и музика
> http://www.slon.bg
>
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

Jay Sprenkle
In reply to this post by James Austin-4
On 3/10/06, James Austin <[hidden email]> wrote:
> I've been doing a fair bit of research into optimising larger tables and
> just want a bit of feedback (most of my lecturers are giving conflicting
> answers).

Each database has it's own strengths and weeknesses and they may all
have different performances. So the correct answer may be different based on
what database is used.
Reply | Threaded
Open this post in threaded view
|

Re: Optimising Large Tables by Splitting 1:1

Christian Smith
In reply to this post by James Austin-4
On Fri, 10 Mar 2006, James Austin wrote:

>I've been doing a fair bit of research into optimising larger tables and
>just want a bit of feedback (most of my lecturers are giving conflicting
>answers).
>
>Just say I have a table with 200+ fields (all relate uniquly to the primary
>key), when querying the first 10 fields of a table I get just as fast a
>response (about the same as if I were querying a table with the same number
>of rows but only 10 fields) however as I get further into the table (i.e.
>selecting fields 180..190 there seems to be a noticable decrease in speed
>(by noticable I mean around 750 benchmark queries per second slower)).


Have a read of:
http://www.sqlite.org/php2004/page-001.html
http://www.sqlite.org/php2004/slides-all.html

Tables are implemented in a Btree, and described from slide 34 onwards.

The first portion of a row is stored along with other rows in a btree leaf
page. If the row doesn't fit, it spills into overflow pages, that are
chained using a singly linked list of page pointers (overflow pages store
data from a single row.) Thus, to find column 180..190 in the above
example, first the row must be located, then the linked list traversed to
find the pages with the desired columns.


>
>As a measure to increase the speed I was thinking that a viable option would
>be to break up the table based on how it is queried, such that sets of
>fields will be divided into new tables that contain fields relating to the
>query (so long as any fields that may be used in multiple queries (and
>require rejoining later) are not separated as this is much slower than
>keeping all fields in one table to begin with).
>
>The main question is, is the method listed above the best way to improve the
>speed of a large table or should they all remain in the same table as
>splitting may cause other problems later on.


My question is, why do you need so many columns in a row? Is this a
hypothetical example, or a real world application?


>
>One method suggested by one of my lectures was to leave the table as one and
>use views, however after testing this out I found views slower than querying
>the table large table directly (and seeing that they don't increase the
>table size it leads me to believe that a view is simply an alias to a
>query).


Views won't help if your data is so inefficiently laid out. Views are
indeed an alias to a query, and very useful for common queries and
transforming data.

You'd be better breaking up your rows into more managable tables, and
using views to create a compatible view of your old data. If this database
is already normalised, however, god help you I say! However, if this is
purely research, it may be better directed elsewhere, as 200+ column rows
are rare even in the most badly run government agencies.


>
>Any feedback is appreciated,


My 2c (hmm, I'm spending a lot recently)


>James Austin.
>

Christian


--
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \
Reply | Threaded
Open this post in threaded view
|

Re: terrible behavior

emilia12
In reply to this post by jreidthompson
yes you are correct,
but even when i rewrite the typo, there was no result (i got
an empty one)
i mean with the originald table, not the present in letter
(i got it with sqlite explorer export function)
but when i select all from the db with sqlite.exe i saw,
that in this column the data was ... NULLs and some strings
(garbage). Now i put corect data (numbers) and all is ok -
so 10x !

regards
e.

Цитат на писмо от Reid Thompson <[hidden email]>:

> [hidden email] wrote:
> > thanks
> >
> > actualy the problem was in the sqlite explorer because
> the
> > column was defined as integer and any text is
> interpreted
> > as zero :-)
> >
> >
> no, actually i think Guillaume was correct.
> This is the query you posted:
>
> SELECT statid
> FROM plan P1
> WHERE P1.curseid = 0;
>
> Note in your data that NO curseid is equal to 0.  They
> are all set to 1.
>
>
> >> [hidden email] a ?crit :
> >>
> >>> hi guys
> >>> i need help :-)
> >>>
> >>> what is the problem :
> >>> (SQLite version 3.3.4, win XP)
> >>>
> >>> i have a table:
> >>>
> >>> CREATE TABLE plan(personid INTEGER, curseid INTEGER,
> statid INTEGER);
> >>> INSERT INTO plan VALUES(1,1,0);
> >>> INSERT INTO plan VALUES(2,1,0);
> >>> INSERT INTO plan VALUES(3,1,0);
> >>> INSERT INTO plan VALUES(4,1,NULL);
> >>> INSERT INTO plan VALUES(5,1,NULL);
> >>> INSERT INTO plan VALUES(6,1,NULL);
> >>> INSERT INTO plan VALUES(7,1,NULL);
> >>> INSERT INTO plan VALUES(8,1,NULL);
> >>> -- ...
> >>>
> >>> and i need to select all rows with "statid" = 0
> >>> so the query is :
> >>> SELECT statid
> >>> FROM plan P1
> >>> WHERE P1.curseid = 0;
> >>>
> >>> BUT what a surprise, i got an empty result (i expect
> to
> >>>
> >> get
> >>
> >>> rows No 1 to 3) !
> >>>
> >>> any suggestions?
> >>>
> >> Yes :
> >> SELECT P1.statid
> >> FROM plan P1
> >> WHERE P1.statid = 0;
> >>
> >> instead of your query...
> >>
> >> but you'll get only what you asked for : only O ;)
> >>
> >>
> >> --
> >>
> >> Guillaume MAISON - N@uteus
> >> 83, Cours Victor Hugo
> >> 47000 AGEN
> >> T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50
> >> e-mail : [hidden email] - Web :
> http://nauteus.com
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> > -----------------------------
> >
> > Slon.bg ™
> > Симпатичният магазин за
> > книги, DVD, игри и музика
> > http://www.slon.bg
> >
> >
> >
>
>
>