Seeking advice on improving batch sql update performance on zipvfs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Yue
Reply | Threaded
Open this post in threaded view
|

Seeking advice on improving batch sql update performance on zipvfs

Yue
Hello,
As mentioned in the subject, our goal is to improve performance regarding
to batch sql updates.
The update sql as follow,

> UPDATE ITEM SET FIELD4 =? WHERE DS=?

We run 100,000 updates in a single transaction. The zipvfs version takes
about 20 min while uncompressed version takes about 7 min.
I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
= 15;".

Any suggestions is very helpful.

Some background:
We have an app running on Android 4.2 using zipvfs.
The page size is 4096
The cache size is - 2000

Table schema for ITEM

> CREATE TABLE ITEM (
> FIELD0 NUMERIC,
> FIELD1 NUMERIC,
> DS TEXT,
> FIELD2 TEXT,
> FIELD3 TEXT,
> FIELD4 NUMERIC,
> FIELD5 NUMERIC,
> FIELD6 NUMERIC,
> FIELD7 NUMERIC,
> FIELD8 NUMERIC,
> FIELD9 NUMERIC,
> FIELD10 NUMERIC,
> FIELD11 TEXT);

The third column: "DS" is what we query by almost all the time. We also
created index:

> CREATE INDEX DS_INDEX ON ITEM(DS);


There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
one shipped with Android 4.2) size of 1.39gb.
Zipvfs db using zlib and aes128, which are default.

Thanks
--

Yue Wu |  Android Developer

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

explain.txt (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Dominique Pellé
Yue Wu <[hidden email]> wrote:

> Hello,
> As mentioned in the subject, our goal is to improve performance regarding
> to batch sql updates.
> The update sql as follow,
>
>> UPDATE ITEM SET FIELD4 =? WHERE DS=?
>
> We run 100,000 updates in a single transaction. The zipvfs version takes
> about 20 min while uncompressed version takes about 7 min.

Which compression algorithm do you use with zipvfs?
Try LZ4, it's is times faster than zlib at compression and
decompression, at the cost of compressing a bit less:

https://github.com/lz4/lz4

Or try zstd, which can compress more than zlib and faster (especially
on 64-bit architectures), but it's not as fast as LZ4:

http://facebook.github.io/zstd/

What is also your SQLite page size?
Reducing the SQLite page size probably helps to speed up
updates, since zipvfs compresses by pages.

Dominique
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Yue
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Yue
Thanks for the advice. I'll try them now.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Barry Smith
In reply to this post by Dominique Pellé
Are your updates sorted by DS? If your queries are sorted then sequential queries are more likely to hit the same db pages while searching the index, resulting in higher cache usage and fewer decompression operations. This would have less benefit if your 100k DS values of the updates are randomly distributed through the 30m available, and more of an effect if they're tightly clustered in some range.

If you mainly query by DS, is that the same as saying it is your primary key? If so, have you tried to declare it so and try the table WITHOUT ROWID, and don't bother with the index? It may help since you wouldn't have to decompress both the index pages and the table pages.

> On 8 Sep 2017, at 12:33 pm, Dominique Pellé <[hidden email]> wrote:
>
> Yue Wu <[hidden email]> wrote:
>
>> Hello,
>> As mentioned in the subject, our goal is to improve performance regarding
>> to batch sql updates.
>> The update sql as follow,
>>
>>> UPDATE ITEM SET FIELD4 =? WHERE DS=?
>>
>> We run 100,000 updates in a single transaction. The zipvfs version takes
>> about 20 min while uncompressed version takes about 7 min.
>
> Which compression algorithm do you use with zipvfs?
> Try LZ4, it's is times faster than zlib at compression and
> decompression, at the cost of compressing a bit less:
>
> https://github.com/lz4/lz4
>
> Or try zstd, which can compress more than zlib and faster (especially
> on 64-bit architectures), but it's not as fast as LZ4:
>
> http://facebook.github.io/zstd/
>
> What is also your SQLite page size?
> Reducing the SQLite page size probably helps to speed up
> updates, since zipvfs compresses by pages.
>
> Dominique
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Teg-3
In reply to this post by Yue
Hello Yue,

Is  it a phone or tablet?  I'd wonder if the CPU is going into thermal
throttle  mode.  As I understand it most phones and tablets can't keep
the CPU's running full speed without eventually throttling the CPU.

To me the fact uncompressed, while moving much more data, is more than 2
times faster suggests the bottleneck isn't the fileIO.

Thursday, September 7, 2017, 3:13:04 PM, you wrote:

YW> Hello,
YW> As mentioned in the subject, our goal is to improve performance regarding
YW> to batch sql updates.
YW> The update sql as follow,

>> UPDATE ITEM SET FIELD4 =? WHERE DS=?

YW> We run 100,000 updates in a single transaction. The zipvfs version takes
YW> about 20 min while uncompressed version takes about 7 min.
YW> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
YW> = 15;".

YW> Any suggestions is very helpful.

YW> Some background:
YW> We have an app running on Android 4.2 using zipvfs.
YW> The page size is 4096
YW> The cache size is - 2000

YW> Table schema for ITEM

>> CREATE TABLE ITEM (
>> FIELD0 NUMERIC,
>> FIELD1 NUMERIC,
>> DS TEXT,
>> FIELD2 TEXT,
>> FIELD3 TEXT,
>> FIELD4 NUMERIC,
>> FIELD5 NUMERIC,
>> FIELD6 NUMERIC,
>> FIELD7 NUMERIC,
>> FIELD8 NUMERIC,
>> FIELD9 NUMERIC,
>> FIELD10 NUMERIC,
>> FIELD11 TEXT);


YW> The third column: "DS" is what we query by almost all the time. We also
YW> created index:

>> CREATE INDEX DS_INDEX ON ITEM(DS);


YW> There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
YW> one shipped with Android 4.2) size of 1.39gb.
YW> Zipvfs db using zlib and aes128, which are default.

YW> Thanks



--
 Teg                            mailto:[hidden email]

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Simon Slavin-3


On 9 Sep 2017, at 9:11pm, Teg <[hidden email]> wrote:

> Is  it a phone or tablet?  I'd wonder if the CPU is going into thermal
> throttle  mode.  As I understand it most phones and tablets can't keep
> the CPU's running full speed without eventually throttling the CPU.

I would never have thought of that unless I had the device in my hands.  Well done.

Yes.  You’re trying to process 100,000 updates, and your device is working flat out for 7 minutes, or for 20 minutes.  How hot is it ?  If you do it unplugged, how much does that drain the battery ?

I don’t have experience with Android but I can tell you that iOS will have to throttle any process that attempts to do process data continuously for 7 minutes.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Eric Grange
In reply to this post by Yue
> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE
DS = 15;".

Not zipvfs specific, but that kind of update can be quite inefficient if
the record is large (in terms of bytes, or records per database page) as
your table declaration hints to.

This will be especially prominent with compression, as a single byte
changed early in the uncompressed stream can lead to changing pretty much
every byte of the compressed data, and so is bound to defeat any
optimizations the storage could have to handle unmodified pages and data.

Depending on your update patterns and frequencies, it could thus be
beneficial to split your table into two tables (or more) with a common
primary key:
- one table for the rarely changed fields (even better if they are
immutable)
- one table (or more) for the fields that change often

This will increase the database size and reduce insert performance, and it
can complicate querying when you need everything (extra inner joins), but
it can help reduce the amount of raw data that is affected by updates quite
drastically.

I have been able to use the above strategy sucessfully, however it was
under Windows compressed folders, so YMMV.


On Thu, Sep 7, 2017 at 9:13 PM, Yue Wu <[hidden email]> wrote:

> Hello,
> As mentioned in the subject, our goal is to improve performance regarding
> to batch sql updates.
> The update sql as follow,
>
> > UPDATE ITEM SET FIELD4 =? WHERE DS=?
>
> We run 100,000 updates in a single transaction. The zipvfs version takes
> about 20 min while uncompressed version takes about 7 min.
> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
> = 15;".
>
> Any suggestions is very helpful.
>
> Some background:
> We have an app running on Android 4.2 using zipvfs.
> The page size is 4096
> The cache size is - 2000
>
> Table schema for ITEM
>
> > CREATE TABLE ITEM (
> > FIELD0 NUMERIC,
> > FIELD1 NUMERIC,
> > DS TEXT,
> > FIELD2 TEXT,
> > FIELD3 TEXT,
> > FIELD4 NUMERIC,
> > FIELD5 NUMERIC,
> > FIELD6 NUMERIC,
> > FIELD7 NUMERIC,
> > FIELD8 NUMERIC,
> > FIELD9 NUMERIC,
> > FIELD10 NUMERIC,
> > FIELD11 TEXT);
>
>
> The third column: "DS" is what we query by almost all the time. We also
> created index:
>
> > CREATE INDEX DS_INDEX ON ITEM(DS);
>
>
> There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
> one shipped with Android 4.2) size of 1.39gb.
> Zipvfs db using zlib and aes128, which are default.
>
> Thanks
> --
>
> Yue Wu |  Android Developer
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Yue
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Yue
In reply to this post by Barry Smith
Hello,
Thanks for your advice!
It works pretty well when I dropped the rowid and set the other field as
primary key.
I also enable wal mode at the cost of a little bit read performance.
Overall, it works really well now.

Really appreciate your help.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Yue
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on improving batch sql update performance on zipvfs

Yue
In reply to this post by Simon Slavin-3
Thanks for the advice!

I could never thought about this. Really learnt a lot.

It's a customized Android phone type device.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users