UNION ALL bug in Multi-threading

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

UNION ALL bug in Multi-threading

sanhua.zh
I find a bug that exists in `UNION ALL`.


tldr: `UNION ALL` will not merge the data in different schemas in the moment that one of schema is committed but the another not. BUT, `UNION` will.


Here are the reproduce steps:


Preparation:
1. Prepare a database named "OLD"
1.1 Create a table for "OLD": `CREATE TABLE old(i INTEGER)`
1.2 Insert values into "OLD" to make it large enough: `INSERT INTO old VALUES(?1)`
2. Prepare a database named "NEW"
2.1 Create a table for "NEW": `CREATE TABLE new(i INTEGER)`


Migration:
For thread 1:
1. Attach "OLD" to "NEW" as "oldSchema": `ATTACH OLD AS oldSchema`
2. Migrate data from "OLD" to "NEW" in same transaction. Note that they should be executed with same handle using ATTACH mentioned in 1.
2.1 `BEGIN IMMEDIATE`
2.1 Select one of the row from "OLD": `SELECT i FROM oldSchema.old`
2.2 Insert the row into "NEW": `INSERT INTO main.new VALUES(?1)`
2.3 Delete the row from "OLD": `DELETE FROM oldSchema.old WHERE i == ?1`
2.4 `COMMIT`


For thread 2-N:
1. Create a view that union two tables: `CREATE TEMP VIEW v AS SELECT i FROM oldSchema.old UNION ALL SELECT i FROM main.new`
2. Select one of the value from view: `SELECT i FROM temp.v ORDER BY i LIMIT 1 OFFSET ?1`.
Here is the strange result:
As an example, if the values of 0-999 is inserted into "OLD", then value N should be selected as expected at offset N.
But in these kind of steps, it will not.


It can be a little bit hard to reproduce due to the multi-threading. BUT if it sleeps for a while when committing, it will be much easier to reproduce:
// vdbeCommit method of vdbeaux.c
for(i=0; rc==SQLITE_OK  idb-nDb; i++){
 Btree *pBt = db-aDb[i].pBt;
 sqlite3_sleep(10); // additional sleep here
 if( pBt ){
  rc = sqlite3BtreeCommitPhaseOne(pBt, 0);
 }
}


It seems that the bug happens when one of the schema is committed but the another one is not.
On the other handle, if `UNION ALL` is changed to `UNION` while creating view, the bug will not happen too.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] UNION ALL bug in Multi-threading

Hick Gunter
Are you sure it is not the special case documented in the ATTACH command? (see https://sqlite.org/lang_attach.html)

" Transactions involving multiple attached databases are atomic, assuming that the main database is not ":memory:" and the journal_mode is not WAL. If the main database is ":memory:" or if the journal_mode is WAL, then transactions continue to be atomic within each individual database file. But if the host computer crashes in the middle of a COMMIT where two or more database files are updated, some of those files might get the changes where others might not."

-----Urspr√ľngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von sanhua.zh
Gesendet: Mittwoch, 01. August 2018 09:57
An: sqlite-users <[hidden email]>; drh <[hidden email]>
Betreff: [EXTERNAL] [sqlite] UNION ALL bug in Multi-threading

I find a bug that exists in `UNION ALL`.


tldr: `UNION ALL` will not merge the data in different schemas in the moment that one of schema is committed but the another not. BUT, `UNION` will.


Here are the reproduce steps:


Preparation:
1. Prepare a database named "OLD"
1.1 Create a table for "OLD": `CREATE TABLE old(i INTEGER)`
1.2 Insert values into "OLD" to make it large enough: `INSERT INTO old VALUES(?1)` 2. Prepare a database named "NEW"
2.1 Create a table for "NEW": `CREATE TABLE new(i INTEGER)`


Migration:
For thread 1:
1. Attach "OLD" to "NEW" as "oldSchema": `ATTACH OLD AS oldSchema` 2. Migrate data from "OLD" to "NEW" in same transaction. Note that they should be executed with same handle using ATTACH mentioned in 1.
2.1 `BEGIN IMMEDIATE`
2.1 Select one of the row from "OLD": `SELECT i FROM oldSchema.old`
2.2 Insert the row into "NEW": `INSERT INTO main.new VALUES(?1)`
2.3 Delete the row from "OLD": `DELETE FROM oldSchema.old WHERE i == ?1`
2.4 `COMMIT`


For thread 2-N:
1. Create a view that union two tables: `CREATE TEMP VIEW v AS SELECT i FROM oldSchema.old UNION ALL SELECT i FROM main.new` 2. Select one of the value from view: `SELECT i FROM temp.v ORDER BY i LIMIT 1 OFFSET ?1`.
Here is the strange result:
As an example, if the values of 0-999 is inserted into "OLD", then value N should be selected as expected at offset N.
But in these kind of steps, it will not.


It can be a little bit hard to reproduce due to the multi-threading. BUT if it sleeps for a while when committing, it will be much easier to reproduce:
// vdbeCommit method of vdbeaux.c
for(i=0; rc==SQLITE_OK  idb-nDb; i++){
 Btree *pBt = db-aDb[i].pBt;
 sqlite3_sleep(10); // additional sleep here  if( pBt ){
  rc = sqlite3BtreeCommitPhaseOne(pBt, 0);  } }


It seems that the bug happens when one of the schema is committed but the another one is not.
On the other handle, if `UNION ALL` is changed to `UNION` while creating view, the bug will not happen too.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] UNION ALL bug in Multi-threading

Dominique Devienne
On Wed, Aug 1, 2018 at 10:39 AM Hick Gunter <[hidden email]> wrote:

> Are you sure it is not the special case documented in the ATTACH command?
> (see https://sqlite.org/lang_attach.html)
>

Good point. OP will tell us if it applies to his/her case.


> " Transactions involving multiple attached databases are atomic, assuming
> that the main database is not ":memory:" and the journal_mode is not WAL.
> If the main database is ":memory:" or if the journal_mode is WAL, then
> transactions continue to be atomic within each individual database file.
> But if the host computer crashes in the middle of a COMMIT where two or
> more database files are updated, some of those files might get the changes
> where others might not."
>

Still, why would UNION behave differently from UNION ALL? That's the
puzzling part IMHO. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] UNION ALL bug in Multi-threading

Hick Gunter
UNION ALL will just return whatever both sides produce, irrespective of duplicates, whereas
UNION will return only 1 copy of duplicated records.

asql> select 1 as x union select 1 as x;
x
-
1
asql> select 1 as x union all select 1 as x;
x
-
1
1attacho

So depending on the relative positions of the read and write transactions and the order of commits, the query will see either one (before or after both commits), two (NEW commits before OLD) or even zero (OLD commits before NEW) copies of any given record.

My guess is that NEW always commits before OLD and so UNION ALL is sometimes returning 2 copies of one record. I also expect that changing the order of the databases (i.e main database is OLD and NEW gets attached) will have 1 record missing both in UNION and UNION ALL.

Additionally, the query has an ORDER BY clause that requires sorting, which drastically changes the query plan.

asql> create table t1 (i integer);
asql> create table t2 (i integer);
asql> create view  un as select * from t1 union select * from t2;
asql> create view  ua as select * from t1 union all select * from t2;
asql> .explain
asql> explain query plan select * from un order by 1;
sele  order          from  deta
----  -------------  ----  ----
2     0              0     SCAN TABLE t1 (~1000000 rows)
3     0              0     SCAN TABLE t2 (~1000000 rows)
1     0              0     COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (UNION)
0     0              0     SCAN SUBQUERY 1 (~2000000 rows)
0     0              0     USE TEMP B-TREE FOR ORDER BY
asql> explain query plan select * from ua order by 1;
sele  order          from  deta
----  -------------  ----  ----
1     0              0     SCAN TABLE t1 (~1000000 rows)
1     0              0     USE TEMP B-TREE FOR ORDER BY
2     0              0     SCAN TABLE t2 (~1000000 rows)
2     0              0     USE TEMP B-TREE FOR ORDER BY
0     0              0     COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

-----Urspr√ľngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Dominique Devienne
Gesendet: Mittwoch, 01. August 2018 11:34
An: General Discussion of SQLite Database <[hidden email]>
Betreff: Re: [sqlite] [EXTERNAL] UNION ALL bug in Multi-threading

On Wed, Aug 1, 2018 at 10:39 AM Hick Gunter <[hidden email]> wrote:

> Are you sure it is not the special case documented in the ATTACH command?
> (see https://sqlite.org/lang_attach.html)
>

Good point. OP will tell us if it applies to his/her case.


> " Transactions involving multiple attached databases are atomic,
> assuming that the main database is not ":memory:" and the journal_mode is not WAL.
> If the main database is ":memory:" or if the journal_mode is WAL, then
> transactions continue to be atomic within each individual database file.
> But if the host computer crashes in the middle of a COMMIT where two
> or more database files are updated, some of those files might get the
> changes where others might not."
>

Still, why would UNION behave differently from UNION ALL? That's the puzzling part IMHO. --DD _______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users