Selecting data from WITH clause is very slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Selecting data from WITH clause is very slow

Frank Millman
Hi all

I am writing an accounting system, and want to select ‘aged balances’ from the debtors table.

To do this, I need to assign each outstanding invoice to an ageing ‘bucket’, where each bucket contains the sum of invoices where date > start_date and date <= end_date. There will be five buckets altogether. In my test database I have 12 debtors and a few thousand invoices.

I could structure it like this (pseudo code) -

    SELECT acc_no, acc_name,
        (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date <= date_5) AS 120_days,
        (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_5 AND invoice_date <= date_4) AS 90_days,
        (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_4 AND invoice_date <= date_3) AS 60_days,
        (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_3 AND invoice_date <= date_2) AS 30_days,
        (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_2 AND invoice_date <= date_1) AS current
    FROM debtors_table

This works, but it requires 5 separate scans of the invoice table, which is inefficient.

I changed it to use a WITH clause to make one scan of the invoice table and store the results.

I won’t show the WITH clause here, as it is not the reason for the question, and would just add clutter. If I run the WITH clause separately, it executes in about 0.15 seconds, and generates 60 rows with 3 columns – account number, bucket number, and accumulated balance.

In the main clause I now have this -

    WITH ageing AS (...)
    SELECT acc_no, acc_name,
        (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 5) AS 120_days,
        (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 4) AS 90_days,
        (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 3) AS 60_days,
        (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 2) AS 30_days,
        (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 1) AS current
    FROM debtors_table

It works, but it takes about 3.5 seconds to execute.

Exactly the same exercise on Sql Server takes 0.17 seconds, and on PostgreSQL 0.22 seconds.

I appreciate that the temporary table created in the WITH clause is not indexed, but 3.5 seconds seems a long time to extract the data from 60 rows.

Is there any way to speed this up?

Frank Millman
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Selecting data from WITH clause is very slow

Clemens Ladisch
Frank Millman wrote:
>     SELECT acc_no, acc_name,
>         (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date <= date_5) AS 120_days,
>         (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_5 AND invoice_date <= date_4) AS 90_days,
>         (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_4 AND invoice_date <= date_3) AS 60_days,
>         (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_3 AND invoice_date <= date_2) AS 30_days,
>         (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_2 AND invoice_date <= date_1) AS current
>     FROM debtors_table

This query does not make sense without a GROUP BY and a correlation,
which would affect how to optimize it.

> This works, but it requires 5 separate scans of the invoice table,
> which is inefficient.

If there is an index on the invoice_date, it's efficient.

Assuming that there is an account_numer filter you did not show, the
index should look like this:

  CREATE INDEX invoices_accno_date_idx ON invoices(account_number, invoice_date);

Alternatively, to create a covering index (http://www.sqlite.org/queryplanner.html#covidx)
for this query, add the invoice_bal column.

> I changed it to use a WITH clause to make one scan of the invoice
> table and store the results.

The WITH clause itself does not store anything; it's syntactic
sugar.

>     WITH ageing AS (...)
>     SELECT acc_no, acc_name,
>         (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 5) AS 120_days,
>         (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 4) AS 90_days,
>         (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 3) AS 60_days,
>         (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 2) AS 30_days,
>         (SELECT balance FROM ageing WHERE account_number = acc_no AND bucket_number = 1) AS current
>     FROM debtors_table
>
> It works, but it takes about 3.5 seconds to execute.

The ageing query is likely to be executed five times for each row in the
outer query.  (This might be efficient with the above the index.)

To force the ageing results to be stored, you'd have to create
a temporary table.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Selecting data from WITH clause is very slow

Frank Millman
Clemens Ladisch wrote:

> Frank Millman wrote:

> > I changed it to use a WITH clause to make one scan of the invoice table and store the results.

> The WITH clause itself does not store anything; it's syntactic sugar [...] To force the ageing results to be stored, you'd have to create a temporary table.

Thanks – I did not know that. It explains everything!
Frank

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Selecting data from WITH clause is very slow

nomad
In reply to this post by Frank Millman
On Mon Sep 11, 2017 at 08:40:41AM +0200, Frank Millman wrote:

> I could structure it like this (pseudo code) -
>
>     SELECT acc_no, acc_name,
>         (SELECT SUM(invoice_bal) FROM invoices
>           WHERE invoice_date <= date_5) AS 120_days,
>         (SELECT SUM(invoice_bal) FROM invoices
>           WHERE invoice_date > date_5 AND invoice_date <= date_4) AS 90_days,
>         (SELECT SUM(invoice_bal) FROM invoices
>           WHERE invoice_date > date_4 AND invoice_date <= date_3) AS 60_days,
>         (SELECT SUM(invoice_bal) FROM invoices
>           WHERE invoice_date > date_3 AND invoice_date <= date_2) AS 30_days,
>         (SELECT SUM(invoice_bal) FROM invoices
>           WHERE invoice_date > date_2 AND invoice_date <= date_1) AS current
>     FROM debtors_table
>
> This works, but it requires 5 separate scans of the invoice table,
> which is inefficient.

You can achieve the same result with a single table scan using CASE
statements:

    SELECT
        acc_no,
        acc_name,
        SUM(
            CASE WHEN
               invoice_date <= date_5
            THEN
                invoice_bal
            ELSE
                0
            END
        ) AS 120_days,
        SUM(
            CASE WHEN
                invoice_date > date_5 AND invoice_date <= date_4
            THEN
                invoice_bal
            ELSE
                0
            END
        ) AS 90_days,
 
        -- .......
 
        SUM(
            CASE WHEN
                invoice_date > date_2 AND invoice_date <= date_1
            THEN
                invoice_bal
            ELSE
                0
            END
        ) AS current
    FROM
       debtors_table
    LEFT JOIN
        invoices
    ON
        invoices.debtor_id = debtors_table.id

If your time periods are not exhaustive (covering all time) then you
can perhaps be more efficient again by moving the filtering into the
join clause:

    SELECT
        acc_no,
        acc_name,
        SUM(i_90.invoice_bal) AS 90_days,
        SUM(i_current.invoice_bal) AS current
    FROM
       debtors_table
    LEFT JOIN
        invoices i_90
    ON
        i_90.debtor_id = debtors_table.id AND
        i_90.invoice_date > date_5 AND
        i_90.invoice_date <= date_4
    LEFT JOIN
        invoices i_current
    ON
        i_current.debtor_id = debtors_table.id AND
        i_current.invoice_date > date_2 AND
        i_current.invoice_date <= date_1
    ORDER BY
        ...

For the above an index on invoices(debtor_id,invoice_date) would be
appropriate.

Also in case you weren't aware, the SUM() function in my second example
could produce NULL values. You may be better served in that instance by
CAST(TOTAL(i_current.invoice_bal) AS INTEGER).

--
Mark Lawrence
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Selecting data from WITH clause is very slow

nomad
On Mon Sep 11, 2017 at 11:15:22AM +0200, [hidden email] wrote:

>     SELECT
>         acc_no,
>         acc_name,
>         SUM(i_90.invoice_bal) AS 90_days,
>         SUM(i_current.invoice_bal) AS current
>     FROM
>        debtors_table
>     LEFT JOIN
>         invoices i_90
>     ON
>         i_90.debtor_id = debtors_table.id AND
>         i_90.invoice_date > date_5 AND
>         i_90.invoice_date <= date_4
>     LEFT JOIN
>         invoices i_current
>     ON
>         i_current.debtor_id = debtors_table.id AND
>         i_current.invoice_date > date_2 AND
>         i_current.invoice_date <= date_1

I should also mention for completeness the need for:

    GROUP BY
        acc_no,
        acc_name

>     ORDER BY
>         ...

--
Mark Lawrence
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Selecting data from WITH clause is very slow

Frank Millman

On Mon Sep 11, 2017 at 11:15:22AM +0200, [hidden email] wrote:

>     SELECT
>         acc_no,
>         acc_name,
>         SUM(i_90.invoice_bal) AS 90_days,
>         SUM(i_current.invoice_bal) AS current
>     FROM
>        debtors_table
>     LEFT JOIN
>         invoices i_90
>     ON
>         i_90.debtor_id = debtors_table.id AND
>         i_90.invoice_date > date_5 AND
>         i_90.invoice_date <= date_4
>     LEFT JOIN
>         invoices i_current
>     ON
>         i_current.debtor_id = debtors_table.id AND
>         i_current.invoice_date > date_2 AND
>         i_current.invoice_date <= date_1
>
> I should also mention for completeness the need for:
>
>     GROUP BY
>         acc_no,
>         acc_name
>
>     ORDER BY
>         ...


Very interesting ideas. I have a lot of experimenting to do!

Thanks very much, Mark.

Frank
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users