Virtual table vs real table query performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Virtual table vs real table query performance

Bob Friesenhahn
We are trying to improve the query performance of our virtual table
implementation (which is implemented in C).  Due to requirements of
external code, a specified column of a specified row (by rowid) is
queried at a time (the least efficient means of access).  Our virtual
table is accessing entries in a memory-based array.

I have implemented a benchmark script written in Python using the APSW
wrapper.  The benchmark script reveals that access to a native
database table is 5 times faster than access to our virtual table.

Intuitively, I would think that access to a memory-based virtual table
could be faster than native tables.  Our developer has implemented
xBestIndex and xFilter support which is intended to result in direct
access to the requested row rather than scanning the whole table.

What is the expected performance of a properly implemented virtual
table (assuming little additional overhead) vs a native table?

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

Hick Gunter
Having imlemented a memory-based virtual table complete with indices, full table scan and direct access via rowid (which happens to be the memory address of the row) I can do a batch delete of 100.000 rows (in a table with 1 composite index) in about 2 seconds (3.7 seconds with the condition) while running linux (RH 5.6 x86_64 VM) on a virtual machine. Deleting all rows of a native SQLite table (while checking for the value of a non-indexed field to avoid SQLite just dropping an re-creating the table) takes about 1 second.



Note that both operations require a full table scan to fill a „rowset“ (= SQLite internal temporary table) and that the virtual table function VUpdate expects the virtual table code to handle index deletetion which is explicitly coded in the native table case.



asql> explain delete from <virtual>;

addr  opcode         p1    p2    p3    p4             p5  comment

----  -------------  ----  ----  ----  -------------  --  -------------

0     Trace          0     0     0                    00  NULL

1     Goto           0     18    0                    00  NULL

2     Integer        0     1     0                    00  NULL

3     Null           0     2     0                    00  NULL

4     VOpen          0     0     0     vtab:187BE588:2ACC1FDC4990  00  NULL

5     Integer        1     4     0                    00  NULL

6     Integer        0     5     0                    00  NULL

7     VFilter        0     12    4                    00  NULL

8     Rowid          0     3     0                    00  NULL

9     RowSetAdd      2     3     0                    00  NULL

10    AddImm         1     1     0                    00  NULL

11    VNext          0     8     0                    00  NULL

12    Close          0     0     0                    00  NULL

13    RowSetRead     2     16    3                    00  NULL

14    VUpdate        0     1     3     vtab:187BE588:2ACC1FDC4990  02  NULL

15    Goto           0     13    0                    00  NULL

16    ResultRow      1     1     0                    00  NULL

17    Halt           0     0     0                    00  NULL

18    VBegin         0     0     0     vtab:187BE588:2ACC1FDC4990  00  NULL

19    Goto           0     2     0                    00  NULL



asql> explain delete from <native> where <fx>=4;

addr  opcode         p1    p2    p3    p4             p5  comment

----  -------------  ----  ----  ----  -------------  --  -------------

0     Trace          0     0     0                    00  NULL

1     Goto           0     31    0                    00  NULL

2     Integer        0     1     0                    00  NULL

3     Null           0     2     0                    00  NULL

4     OpenRead       0     215   0     7              00  <native>

5     Rewind         0     13    0                    00  NULL

6     Column         0     6     4                    00  <native>.<fx>

7     Integer        4     5     0                    00  NULL

8     Ne             5     12    4   collseq(BINARY)  6c  NULL

9     Rowid          0     3     0                    00  NULL

10    RowSetAdd      2     3     0                    00  NULL

11    AddImm         1     1     0                    00  NULL

12    Next           0     6     0                    01  NULL

13    Close          0     0     0                    00  NULL

14    OpenWrite      0     215   0     8              00  <native>

15    OpenWrite      1     1362  0  Keyinfo(5,BINARY,BINARY)  00  <index>

16    RowSetRead     2     27    3                    00  NULL

17    NotExists      0     26    3                    00  NULL

18    Rowid          0     11    0                    00  NULL

19    Column         0     1     6                    00  <native>.<kf1>

20    Column         0     2     7                    00  <native>.<kf2>

21    Column         0     3     8                    00  <native>.<kf3>

22    Column         0     4     9                    00  <native>.<kf4>

23    Column         0     5     10                   00  <native>.<kf5>

24    IdxDelete      1     6     6                    00  NULL

25    Delete         0     1     0       <native>     00  NULL

26    Goto           0     16    0                    00  NULL

27    Close          1     1362  0                    00  NULL

28    Close          0     0     0                    00  NULL

29    ResultRow      1     1     0                    00  NULL

30    Halt           0     0     0                    00  NULL

31    Transaction    0     1     0                    00  NULL

32    VerifyCookie   0     1191  0                    00  NULL

33    TableLock      0     215   1      <native>      00  NULL

34    Goto           0     2     0                    00  NULL



-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Bob Friesenhahn
Gesendet: Dienstag, 07. Februar 2017 22:06
An: SQLite mailing list <[hidden email]>
Betreff: [sqlite] Virtual table vs real table query performance



We are trying to improve the query performance of our virtual table implementation (which is implemented in C).  Due to requirements of external code, a specified column of a specified row (by rowid) is queried at a time (the least efficient means of access).  Our virtual table is accessing entries in a memory-based array.



I have implemented a benchmark script written in Python using the APSW wrapper.  The benchmark script reveals that access to a native database table is 5 times faster than access to our virtual table.



Intuitively, I would think that access to a memory-based virtual table could be faster than native tables.  Our developer has implemented xBestIndex and xFilter support which is intended to result in direct access to the requested row rather than scanning the whole table.



What is the expected performance of a properly implemented virtual table (assuming little additional overhead) vs a native table?



Bob

--

Bob Friesenhahn

[hidden email]<mailto:[hidden email]>, http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________

sqlite-users mailing list

[hidden email]<mailto:[hidden email]>

http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___________________________________________
Gunter Hick | Software Engineer | Scientific Games International GmbH<http://www.scigames.at> | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

Bob Friesenhahn
On Wed, 8 Feb 2017, Hick Gunter wrote:

> Having imlemented a memory-based virtual table complete with
> indices, full table scan and direct access via rowid (which happens
> to be the memory address of the row) I can do a batch delete of
> 100.000 rows (in a table with 1 composite index) in about 2 seconds

The case I am interested is pure read performance of a single column
element at a time given properly implemented xBestIndex and xFilter
support.  Rows are not being added/removed using sqlite.

It is possible that native tables can be faster since the
implementation is not limited to the rigid set of callback functions
provided for virtual tables to use and of course the amalgamation is
optimized by the compiler as one source module.

By tracing the callbacks, we do see that our implementation is not
invoking the callbacks more times than necessary (which was not the
case before xBestIndex and xFilter support was added).  Due to the
requirements of the implementation, POSIX reader/writer locks are used
so there is some low-contention locking overhead.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

Hick Gunter
Values are for retrieving 100.000 rows with a where clause not satisfiable from the index but true for alle rows

asql> select count() from <virtual>;
CPU Time: user 0.092986 sys 0.000000

asql> select count() from <virtual> where <fx>=4;
CPU Time: user 0.189971 sys 0.000000
CPU Time: user 0.199969 sys 0.000000
CPU Time: user 0.199970 sys 0.000000

asql> select count() from <native> where <fx>=4;
CPU Time: user 0.086987 sys 0.010998
CPU Time: user 0.085987 sys 0.009999
CPU Time: user 0.076988 sys 0.002000

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Bob Friesenhahn
Gesendet: Mittwoch, 08. Februar 2017 15:39
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Virtual table vs real table query performance

On Wed, 8 Feb 2017, Hick Gunter wrote:

> Having imlemented a memory-based virtual table complete with indices,
> full table scan and direct access via rowid (which happens to be the
> memory address of the row) I can do a batch delete of
> 100.000 rows (in a table with 1 composite index) in about 2 seconds

The case I am interested is pure read performance of a single column element at a time given properly implemented xBestIndex and xFilter support.  Rows are not being added/removed using sqlite.

It is possible that native tables can be faster since the implementation is not limited to the rigid set of callback functions provided for virtual tables to use and of course the amalgamation is optimized by the compiler as one source module.

By tracing the callbacks, we do see that our implementation is not invoking the callbacks more times than necessary (which was not the case before xBestIndex and xFilter support was added).  Due to the requirements of the implementation, POSIX reader/writer locks are used so there is some low-contention locking overhead.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

dbilid
In reply to this post by Bob Friesenhahn
Hello,


Do you perform the benchmark on the native database table using cold cache or warm cache? Also, can you briefly describe what the benchmark does and give the schema for the native database table?


thanks


________________________________
From: sqlite-users <[hidden email]> on behalf of Bob Friesenhahn <[hidden email]>
Sent: Wednesday, February 8, 2017 2:39 PM
To: SQLite mailing list
Subject: Re: [sqlite] Virtual table vs real table query performance



The case I am interested is pure read performance of a single column
element at a time given properly implemented xBestIndex and xFilter
support.  Rows are not being added/removed using sqlite.

It is possible that native tables can be faster since the
implementation is not limited to the rigid set of callback functions
provided for virtual tables to use and of course the amalgamation is
optimized by the compiler as one source module.

By tracing the callbacks, we do see that our implementation is not
invoking the callbacks more times than necessary (which was not the
case before xBestIndex and xFilter support was added).  Due to the
requirements of the implementation, POSIX reader/writer locks are used
so there is some low-contention locking overhead.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
GraphicsMagick Image Processing System<http://www.graphicsmagick.org/>
www.graphicsmagick.org
GraphicsMagick is a robust collection of tools and libraries to read, write, and manipulate an image in any of the more popular image formats including GIF, JPEG, PNG ...


<http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

Dominique Devienne
In reply to this post by Hick Gunter
On Wed, Feb 8, 2017 at 4:30 PM, Hick Gunter <[hidden email]> wrote:

> Values are for retrieving 100.000 rows with a where clause not satisfiable
> from the index but true for alle rows
>
> asql> select count() from <virtual>;
> CPU Time: user 0.092986 sys 0.000000
>
> asql> select count() from <virtual> where <fx>=4;
> CPU Time: user 0.189971 sys 0.000000
> CPU Time: user 0.199969 sys 0.000000
> CPU Time: user 0.199970 sys 0.000000
>
> asql> select count() from <native> where <fx>=4;
> CPU Time: user 0.086987 sys 0.010998
> CPU Time: user 0.085987 sys 0.009999
> CPU Time: user 0.076988 sys 0.002000
>


Frankly I'm surprised it's slower than "native" SQLite.

In bulk-insert, random lookup, and table-delete timings we did in 2009
between native in-memory SQLite, and pure-C++ virtual tables accessing
pure C++ data structures (i.e. vm/reflection/introspection/dynamic lookup
as in Python for example, but direct addressing of statically typed data),
the virtual tables was always faster, and not by a small margin.

Admittedly it was a long time ago, and SQLite is getting faster all the time
for sure, but you can't beat static typing of memory addressable structures,
vs scanning pages of table data and dynamically/serially decoding variable
sizes
rows within those pages.

So something like "non-native" code or something "dynamic" is hiding in the
virtual table impl, no? --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

Hick Gunter
This is with a generic, user definable table and index structure capable virtual table implementation; it is not a "one module per table" statically typed heavily optimized implementation.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Dominique Devienne
Gesendet: Mittwoch, 08. Februar 2017 16:42
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Virtual table vs real table query performance

On Wed, Feb 8, 2017 at 4:30 PM, Hick Gunter <[hidden email]> wrote:

> Values are for retrieving 100.000 rows with a where clause not
> satisfiable from the index but true for alle rows
>
> asql> select count() from <virtual>;
> CPU Time: user 0.092986 sys 0.000000
>
> asql> select count() from <virtual> where <fx>=4;
> CPU Time: user 0.189971 sys 0.000000
> CPU Time: user 0.199969 sys 0.000000
> CPU Time: user 0.199970 sys 0.000000
>
> asql> select count() from <native> where <fx>=4;
> CPU Time: user 0.086987 sys 0.010998
> CPU Time: user 0.085987 sys 0.009999
> CPU Time: user 0.076988 sys 0.002000
>


Frankly I'm surprised it's slower than "native" SQLite.

In bulk-insert, random lookup, and table-delete timings we did in 2009 between native in-memory SQLite, and pure-C++ virtual tables accessing pure C++ data structures (i.e. vm/reflection/introspection/dynamic lookup as in Python for example, but direct addressing of statically typed data), the virtual tables was always faster, and not by a small margin.

Admittedly it was a long time ago, and SQLite is getting faster all the time for sure, but you can't beat static typing of memory addressable structures, vs scanning pages of table data and dynamically/serially decoding variable sizes rows within those pages.

So something like "non-native" code or something "dynamic" is hiding in the virtual table impl, no? --DD _______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

Dominique Devienne
On Wed, Feb 8, 2017 at 4:50 PM, Hick Gunter <[hidden email]> wrote:

> This is with a generic, user definable table and index structure capable
> virtual table implementation; it is not a "one module per table" statically
> typed heavily optimized implementation.
>

Ah, that makes complete sense then. I didn't want the OP to think virtual
tables were slower
than native tables in the general case, especially since he mentioned
memory arrays in C code.

And indeed the virtual-table advantage I mentioned is with a different
statically-typed vtable impl/module per vtable,
with statically defined indexes, where the table structure is hard-coded in
the vtable impl itself, and corresponds
to a native "row" data structure. In that config one leverages the
"front-end" of SQLite (parser and VDBE engine)
and very little of the "back-end" (pager and btree), except when SQLite
decides to make temporary tables for
query processing I guess. FWIW. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

Bob Friesenhahn
In reply to this post by dbilid
On Wed, 8 Feb 2017, Dimitris Bil wrote:

> Do you perform the benchmark on the native database table using cold
> cache or warm cache? Also, can you briefly describe what the
> benchmark does and give the schema for the native database table?

My benchmark repeatedly reads all of the columns one by one given row
id and column name.  The table is read many (e.g. 100) times so this
is a warm cache test.

The schema is not terribly important but the table we are trying to
optimize (with 1800 or less rows) contains a 64-bit rowid, five
integer values, and two short text string values.

   int64, uint32, uint32, text[16], uint8, text[16], text[18], uint8,
   uint32

What I am looking for is expected average virtual table performance vs
native table performance for repeated column reads.

Due to being a generic implementation (supporting many virtual
tables), our virtual implementation uses programmed/dynamic
marshalling rather that compiled marshalling.  The schema definition
is also dynamically generated.

There are implementation overheads and it is useful to know what
performance is possible (e.g. compared to native table performance)
in order to know when the implementation is about as good as it can
be.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Virtual table vs real table query performance

dbilid
I see, so in the native implementation you already have the whole table in memory and only use the clustered b-tree index to search for tuples. So I would not  expect a large improvement from the virtual table implementation, but the virtual table being 5 times slower is strange. Maybe not the correct data structure used?


By the way, I had tried adding a virtual table in the sqlite amalgamation and I did not see observable difference. On the other hand, I have seen improvement in the virtual table utilization using the latest version of sqlite (in comparison to a release about a year ago).


________________________________
From: sqlite-users <[hidden email]> on behalf of Bob Friesenhahn <[hidden email]>
Sent: Wednesday, February 8, 2017 4:09 PM
To: SQLite mailing list
Subject: Re: [sqlite] Virtual table vs real table query performance

On Wed, 8 Feb 2017, Dimitris Bil wrote:

> Do you perform the benchmark on the native database table using cold
> cache or warm cache? Also, can you briefly describe what the
> benchmark does and give the schema for the native database table?

My benchmark repeatedly reads all of the columns one by one given row
id and column name.  The table is read many (e.g. 100) times so this
is a warm cache test.

The schema is not terribly important but the table we are trying to
optimize (with 1800 or less rows) contains a 64-bit rowid, five
integer values, and two short text string values.

   int64, uint32, uint32, text[16], uint8, text[16], text[18], uint8,
   uint32

What I am looking for is expected average virtual table performance vs
native table performance for repeated column reads.

Due to being a generic implementation (supporting many virtual
tables), our virtual implementation uses programmed/dynamic
marshalling rather that compiled marshalling.  The schema definition
is also dynamically generated.

There are implementation overheads and it is useful to know what
performance is possible (e.g. compared to native table performance)
in order to know when the implementation is about as good as it can
be.

Bob
--

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users