Quantcast

Replicate SQLite and keep sync (every half hour) from PostgreSQL table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Replicate SQLite and keep sync (every half hour) from PostgreSQL table

Michael Nielsen
I'm able to access a remote PostgreSQL table from my server.
However, the PostgreSQL table contains around 50 mio. records, and I have a
certain column ID which I only need.

I would like to replicate the PostgreSQL table (including a WHERE clause)
to a in-memory SQLite database, which will sync/update every 30 minutes (or
so).

In reboot etc. the replication may start over.

How would be the best approach to such a challenge?

I've actually never worked with SQLite before.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Replicate SQLite and keep sync (every half hour) from PostgreSQL table

Jens Alfke-2

> On Feb 2, 2017, at 1:15 AM, Michael Nielsen <[hidden email]> wrote:
>
> I would like to replicate the PostgreSQL table (including a WHERE clause)
> to a in-memory SQLite database, which will sync/update every 30 minutes (or
> so).

There isn’t any built-in or standard way to do this between arbitrary databases. You’ll have to implement your own solution. (I speak as someone who’s specialized in database replication for the past five years; but I work on a database engine that was designed for replication.)

In order to do an efficient (incremental) sync, you have to be able to query the remote [Postgres] database and determine:
(a) what rows are new since the last time you synced;
(b) what rows have changed since the last time you synced;
(c) what rows have been deleted since the last time you synced

If the table has an auto-incrementing primary key, (a) is pretty easy to do; you just query for rows where the primary is greater than the maximum value you’ve seen before.

Doing (b) requires that the rows have something like a modification timestamp or Lamport clock.

Generally (c) is the hardest, because a deletion usually doesn’t leave any trace behind. The best solution would be to attach a trigger to the table that adds a ‘tombstone’ to an auxiliary table to record the deletion. (If you can do that, you can use a similar approach to log inserts and updates, which makes the whole problem a lot easier.) If you can’t do that, the best you can do, I think, is to query the primary key of every existing row, and match them all against the records in your local database.

A completely different approach is to run a program on the remote machine that dumps the entire table to a file in some simple format like CSV or JSON; then use rsync or something similar to copy that file to your local machine; then read the file and rebuild the local database from it. (This might not be too inefficient, because rsync is pretty good about transferring minimal data to update the file.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...