DB To DB Transfer Time

classic Classic list List threaded Threaded
8 messages Options
dmp
Reply | Threaded
Open this post in threaded view
|

DB To DB Transfer Time

dmp
Hello,

I have created a database to database transfer plugin for
my application. The plugin is functional and I have
collected a log of testing results stored in a SQLite db.
The links below are of the database and definition of
parameters fields. The main exception, dbTypeMemory,
meaning memory or file db.

https://github.com/danap/db_to_filememorydb/blob/master/docs/dblog/db_To_db_Parameters-log.db
https://github.com/danap/db_to_filememorydb/blob/master/lib/plugins/DB_To_FileMemoryDB/transfer-param.config

Testing was done with a 50K entry MariaDB table on a low
end local network machine. Four fields in the table, Integer,
Double, Text, and Blob. Text and Blob fields of arbitrary
length to 10K.

Resulting file database sizes for all 50K rows were in the
range of 390MB to 1.83GB.

The application is Java based and uses JDBCs.

The result for the 50K file db test of SQLite was 370.184
seconds. Is this a reasonable transfer speed, given the
conditions noted?

danap.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: DB To DB Transfer Time

Stephen Chrzanowski
You're kind of asking if a 4 cylinder engine can outrun an 8 cylinder
engine by just physically looking at the engines.  The answer is yes.  It
can.  Given the right circumstances.

Your transfer rates are going to be subjective based on your hardware, network
speeds at the source and destination (I know, LAN, but still relevant),
plus what kind of transactions you're doing per-insert on your SQLite
database, indexing on the database, and such.

If you DON'T wrap your insert/update/delete statements in a transaction,
then by SQLite will open a transaction, write your one row, then close the
transaction, and do that for every single row you're
adding/deleting/updating.

Just writing this particular email, I've been bouncing back and forth
between "Yeah, that's fine" and "Well, maybe not..." because there's a LOT
of things in the way that we're not clear on.  Especially the transactions
aspect.



On Tue, Sep 25, 2018 at 2:14 PM, dmp <[hidden email]> wrote:

> Hello,
>
> I have created a database to database transfer plugin for
> my application. The plugin is functional and I have
> collected a log of testing results stored in a SQLite db.
> The links below are of the database and definition of
> parameters fields. The main exception, dbTypeMemory,
> meaning memory or file db.
>
> https://github.com/danap/db_to_filememorydb/blob/master/
> docs/dblog/db_To_db_Parameters-log.db
> https://github.com/danap/db_to_filememorydb/blob/master/lib/plugins/DB_To_
> FileMemoryDB/transfer-param.config
>
> Testing was done with a 50K entry MariaDB table on a low
> end local network machine. Four fields in the table, Integer,
> Double, Text, and Blob. Text and Blob fields of arbitrary
> length to 10K.
>
> Resulting file database sizes for all 50K rows were in the
> range of 390MB to 1.83GB.
>
> The application is Java based and uses JDBCs.
>
> The result for the 50K file db test of SQLite was 370.184
> seconds. Is this a reasonable transfer speed, given the
> conditions noted?
>
> danap.
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: DB To DB Transfer Time

Dominique Devienne
In reply to this post by dmp
On Tue, Sep 25, 2018 at 8:41 PM dmp <[hidden email]> wrote:

> The result for the 50K file db test of SQLite was 370.184
> seconds. Is this a reasonable transfer speed, given the conditions noted?
>

As Stephen already wrote, impossible to say.
Start by contrasting this DB-copy you wrote to other methods:
1) filesystem copy of the DB file (not open by any connection) on the same
hardward/disks
2) using sqlite3[.exe]'s .dump piped into another sqlite3 instance to
recreate the DB from DB at the destination
3) using sqlite3[.exe]'s .backup, which uses the Backup API you could use
in your own code too (assuming available from Java).

Now you'd have good reference point to compare against your own Java-based
implementation.
Completely guessing, but 370s seems to be "slow". Do #1 and/or #2 and/or #3
above, and you'll know for sure.

FWIW, --DD

PS: There are many ways to implement such a copy, and you're not saying how
you implemented it.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
dmp
Reply | Threaded
Open this post in threaded view
|

Re: DB To DB Transfer Time

dmp
In reply to this post by dmp
Hello,

Thank you for the responses, Stephen and Dominique.

I was generally looking for an idea if the transaction
time seemed reasonable, given a low end MariaDB server,
standard 100Mb LAN, and mid range user processing desktop
machine.

I realize that the question as answered is; Hard to say!
My initial reaction was slow, not good enough.

The log db, was created because of the parameters involved
in evaluating the transfer had many factors. This allowed
a way to track changes in those parameters with results.
It also gives additional insight to the testing hardware.
SQLite was one of four local file/memory databases evaluated.

https://github.com/danap/db_to_filememorydb/blob/master/docs/dblog/db_To_db_Parameters-log.db

Again thanks for responses. The idea of using transfers
with the SQLite built in tools, .dump, .backup if setup
with a similar scenario can provide some reference point
to answer my question.

danap.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: DB To DB Transfer Time

Jens Alfke-2
In reply to this post by dmp


> On Sep 25, 2018, at 11:14 AM, dmp <[hidden email]> wrote:
>
> The result for the 50K file db test of SQLite was 370.184
> seconds. Is this a reasonable transfer speed, given the
> conditions noted?

You haven’t specified how much of that time was spent in SQLite. For all we know, 370 seconds was spent in MariaDB and your own code, and it only took .184 sec for SQLite to insert the rows :)

Try running your code with a profiler. Or at least wrap some quick & dirty timing code around your functions that call SQLite.

Also, make sure to insert as many rows as possible within a transaction. Committing a transaction in SQLite is quite expensive due to filesystem flushing.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
dmp
Reply | Threaded
Open this post in threaded view
|

Re: DB To DB Transfer Time

dmp
In reply to this post by dmp
>> On Sep 25, 2018, at 11:14 AM, dmp <[hidden email]> wrote:
>>
>> The result for the 50K file db test of SQLite was 370.184
>> seconds. Is this a reasonable transfer speed, given the
>> conditions noted?

> You haven't specified how much of that time was spent in SQLite. For all
we know,
> 370 seconds was spent in MariaDB and your own code, and it only took
.184 sec for
> SQLite to insert the rows :)

> Try running your code with a profiler. Or at least wrap some quick &
dirty timing
> code around your functions that call SQLite.

> Also, make sure to insert as many rows as possible within a transaction.
Committing
> a transaction in SQLite is quite expensive due to filesystem flushing.

> —Jens


Hello,

The DB to DB code is a Java plugin for my application. It takes
a query to the current database, converts that into a schema,
creates a new table in the selected sink database of a comparable
schema, then begins transferring data.

The data transfer is setup with two threads, the source database
thread loading data and the sink database thread inserting via
prepared statements in a batch mode. The threads are connected
via an ArrayBlockingQueue pipe. I log the pipe capacity as the
process takes place.

The timing given takes place at the beginning of the transfer
process and stops when completed, threads join.

Now from the users of the forum, I did some testing to establish
a reference transfer time. After some thought I realized I could
get that number by just performing a direct file transfer of
the created SQLite database file from the MariaDB server to the
client machine. Results 54 seconds.

Upon further testing, increasing the pipe size, 250 to 1000.
I was able to get the time for the SQLite test to 164.721 sec.
Average pipe capacity, 85 objects. That tells me that I could
do better perhaps using two loading threads. I already suspected
the loading thread filler to be slower and therefore had coded
it with an increased priority of one.

The testing so far shows SQLite comparable to the other tested
memory databases. SQLite beat out all but one with a file
database. One of the databases did miserable with the pipe
capacity close to saturated and a time of 1374.614 sec.

Testing:
Transfer 50,000 rows, (Integer, Real, 0-10k Text, 0-10k blob)
Resulting SQLite file database size: 404Mb.

Updated SQLite Log Database:
https://github.com/danap/db_to_filememorydb/blob/master/docs/dblog/db_To_db_Parameters-log.db

danap.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: DB To DB Transfer Time

Dominique Devienne
On Tue, Oct 2, 2018 at 6:56 PM dmp <[hidden email]> wrote:

> >> On Sep 25, 2018, at 11:14 AM, dmp <[hidden email]>
> wrote:
> The timing given takes place at the beginning of the transfer
> process and stops when completed, threads join.
>

Why aren't you measuring the time spent only in the SQLite writing thread?

That would eliminate the overhead from the read-thread(s) and the queue in
the middle,
measuring only work by SQLite, instead of including the *waiting* for work
in the queue.

I wrote a similarly "piped" ETL not too long ago, and I output overall
"wall" time of course,
but also time spent reading, time spent writing, but important times when
reader/writer threads
were "stalled", because the pipe is full/empty (respectively). --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
dmp
Reply | Threaded
Open this post in threaded view
|

Re: DB To DB Transfer Time

dmp
In reply to this post by dmp
> Why aren't you measuring the time spent only in the SQLite writing thread?
>
> That would eliminate the overhead from the read-thread(s) and the queue in
> the middle,
> measuring only work by SQLite, instead of including the *waiting* for work
> in the queue.
>
> I wrote a similarly "piped" ETL not too long ago, and I output overall
> "wall" time of course,
> but also time spent reading, time spent writing, but important times when
> reader/writer threads
> were "stalled", because the pipe is full/empty (respectively). --DD

Hello,

Too the question, because it is unnecessary coding, time wasted. If I have
an idea already of the goal and with the timing overall can determine where
to focus the effort it is more efficient use of my time. The monitoring of
the pipe, one coding action, already gives me an idea of when read/writer
are stalled. So no need in to have timing for those.

Yesterday I put the threads on equal footing and derived the following
result which are close to my goal.

50,000 rows queried from a networked MariaDB, fields (integer, real, text,
blob).

SQLite - 114.838 seconds
H2 - 115.868 seconds
Derby - 136.984 seconds
HSQL - 1291.808 seconds

Mind you that the plugin needs to use any query to any of the supported
databases, MariaDB, Oracle, PostgreSQL, etc. and derive a comparable
schema table from the query to create a file/memory database.

Looks like SQLite or H2 are the most likely candidates at this time.

All of this is a lot of variables that effect timing. Focusing on only
the data transfer timing, above, or writes to the sink db is only part
of the timing, though probably the place to derive the most benefit.

danap.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users