JSON_EACH + recursive query = unexpected performance degradation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

JSON_EACH + recursive query = unexpected performance degradation

Хусаинов Динар
I have a dataset of about 300 rows which have parent-child relations.
Due to factors unrelated to the issue I build the rows by zipping JSON
arrays with values from each column.
Then I run a simplest recursive query on it to get the whole tree(ends
up being ~4 levels).

Problem: the query takes 3000 ms (3 seconds) on my machine to complete.
If I create a real table with the SAME structure, insert the SAME data
into it, and run the SAME query, get the SAME result back, it takes
10-15 ms (200-300 TIMES faster).


I attached both queries(don't require schema to run)


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

query1.sql (27K) Download Attachment
query2.sql (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JSON_EACH + recursive query = unexpected performance degradation

David Raymond
Not necessarily related to the question itself, but how did the attachments actually come through with this mail? Every single other person to try and attach something to this list has had it stripped off.


-----Original Message-----
From: sqlite-users <[hidden email]> On Behalf Of ???????? ?????
Sent: Tuesday, February 11, 2020 1:28 AM
To: [hidden email]
Subject: [sqlite] JSON_EACH + recursive query = unexpected performance degradation

I have a dataset of about 300 rows which have parent-child relations.
Due to factors unrelated to the issue I build the rows by zipping JSON
arrays with values from each column.
Then I run a simplest recursive query on it to get the whole tree(ends
up being ~4 levels).

Problem: the query takes 3000 ms (3 seconds) on my machine to complete.
If I create a real table with the SAME structure, insert the SAME data
into it, and run the SAME query, get the SAME result back, it takes
10-15 ms (200-300 TIMES faster).


I attached both queries(don't require schema to run)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JSON_EACH + recursive query = unexpected performance degradation

Richard Hipp-3
On 2/12/20, David Raymond <[hidden email]> wrote:
> Not necessarily related to the question itself, but how did the attachments
> actually come through with this mail? Every single other person to try and
> attach something to this list has had it stripped off.

Sometimes Mailman asks me for approval for messages containing
attachments.  I think this depends on the size and mimetype of the
attachments.  In this case, I was asked and I approved.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JSON_EACH + recursive query = unexpected performance degradation

Jens Alfke-2
In reply to this post by Хусаинов Динар


> On Feb 10, 2020, at 10:27 PM, Хусаинов Динар <[hidden email]> wrote:
>
> Problem: the query takes 3000 ms (3 seconds) on my machine to complete. If I create a real table with the SAME structure, insert the SAME data into it, and run the SAME query, get the SAME result back, it takes 10-15 ms (200-300 TIMES faster).

It sounds like SQLite is much faster at traversing real tables than it is at traversing JSON arrays. Which I would expect. This is exacerbated by the fact that the recursive CTE must be doing a lot of traversals of the JSON.

You should be able to speed this up by creating temporary tables from the JSON first, and then changing the CTE to use those tables.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JSON_EACH + recursive query = unexpected performance degradation

Keith Medcalf
In reply to this post by Хусаинов Динар

The easiest way is to phrase the query such that the table is internally materialized.  In query1.sql the easiest way to do that is to change the:

WITH
------------------------build a in memory table with parent-child relations from 3 json arrays _which have the same size_
"objects_in_memory" AS (
   SELECT "id"."value"     AS "id",

to read like this:

WITH
------------------------build a in memory table with parent-child relations from 3 json arrays _which have the same size_
"objects_in_memory" AS (
   SELECT DISTINCT "id"."value"     AS "id",

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of ???????? ?????
>Sent: Monday, 10 February, 2020 23:28
>To: [hidden email]
>Subject: [sqlite] JSON_EACH + recursive query = unexpected performance
>degradation
>
>I have a dataset of about 300 rows which have parent-child relations.
>Due to factors unrelated to the issue I build the rows by zipping JSON
>arrays with values from each column.
>Then I run a simplest recursive query on it to get the whole tree(ends
>up being ~4 levels).
>
>Problem: the query takes 3000 ms (3 seconds) on my machine to complete.
>If I create a real table with the SAME structure, insert the SAME data
>into it, and run the SAME query, get the SAME result back, it takes
>10-15 ms (200-300 TIMES faster).
>
>
>I attached both queries(don't require schema to run)




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: JSON_EACH + recursive query = unexpected performance degradation

Warren Young
In reply to this post by Jens Alfke-2
On Feb 12, 2020, at 10:53 AM, Jens Alfke <[hidden email]> wrote:
>
> You should be able to speed this up by creating temporary tables from the JSON first, and then changing the CTE to use those tables.

Do you not get the same effect by using the new generated columns feature, only without the manual work of maintaining the temporary table?

    https://www.sqlite.org/gencol.html

sqlite> create table a (
  json text,
  b text generated always as (json_extract(json, '$.field')) stored
);
sqlite> insert into a values('{"field": "hello"}');                             sqlite> select b from a;
hello


It’s probably critical to the success of this that you use the STORED attribute rather than VIRTUAL, which means you can’t ALTER TABLE your way to success, but you’d be looking at table copies with the temporary table idea anyway.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users