Optmize queries on ranges

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Optmize queries on ranges

siscia
Hi all,

I am facing an interesting optimization problem.

I have a table like this:

CREATE TABLE ranges (
    start int,
    end int,
    value int,
);

The query that I am interested in optimizing is "select value from ranges
where (? between start and end)"

The max performance that I was able to get is 250 results/second with a
covering index on all three columns.

Now, if I do a more classic "select value from ranges where start = ?" this
provides 140000 results/second

So I am pretty sure that I am doing something quite wrong.

Do you guys have any idea of what it could be? How can I obtain better
results?

Cheers,

Simone



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Dan Kennedy-4
On 10/25/2018 11:13 PM, siscia wrote:

> Hi all,
>
> I am facing an interesting optimization problem.
>
> I have a table like this:
>
> CREATE TABLE ranges (
>     start int,
>     end int,
>     value int,
> );
>
> The query that I am interested in optimizing is "select value from ranges
> where (? between start and end)"
>
> The max performance that I was able to get is 250 results/second with a
> covering index on all three columns.
>
> Now, if I do a more classic "select value from ranges where start = ?" this
> provides 140000 results/second
>
> So I am pretty sure that I am doing something quite wrong.
>
> Do you guys have any idea of what it could be? How can I obtain better
> results?

Your query is the same as "start <= ? AND end >= ?". The trouble is that
SQlite can only use the index to optimize one of "start <= ?" or "end >=
?". And so you might be iterating through a very large set of records to
extract the ones you want.

R-tree might work for you:

   https://sqlite.org/rtree.html

Dan.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Simon Slavin-3
In reply to this post by siscia
On 25 Oct 2018, at 5:13pm, siscia <[hidden email]> wrote:

> CREATE TABLE ranges (
>    start int,
>    end int,
>    value int,
> );
>
> The query that I am interested in optimizing is
> "select value from ranges
>     where (? between start and end)"

First, "END" is a reserved keyword in SQLite.  Your use of it might work right now but you may find yourself in trouble later when you introduce a trigger or some other construction.  I suggest you replace it as a column name with "finish" or perhaps both ends with "low" and "high".  See

<https://www.sqlite.org/lang_keywords.html>

As an experiment to figure out a good optimization for your search problem, try the following:

1. Create two indexes on that table, one on (low,high,value), the other on (high,low,value).
2. Ensure that your 'ranges' table has plausible data in, both the number of rows and the contents of those rows must be similar to what you expect the table to contain in normal use.
3. Run the SQL command "ANALYZE".  This tells SQLite to look at the table and figure out good ways to run future searches and sorts.  The results of this are stored in the database.  You will not need to run the command again even if you change the content of the database.

Now run your query again and see whether the timing has changed.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Keith Medcalf
In reply to this post by Dan Kennedy-4

On Thursday, 25 October, 2018 10:48, Dan Kennedy <[hidden email]> wrote:

>On 10/25/2018 11:13 PM, siscia wrote:

>> Hi all,

>> CREATE TABLE ranges (
>>     start int,
>>     end int,
>>     value int,
>> );

>> The query that I am interested in optimizing is "select value from
>> ranges where (? between start and end)"

>Your query is the same as "start <= ? AND end >= ?". The trouble is
>that SQlite can only use the index to optimize one of "start <= ?"
>or "end >= ?".

select value from ranges where ? between start and end;
is *almost* the same as
select value from ranges where start <= ?1 and ?1 <= end;

There is an extra column load and compare when using the between version of the query (this is because although the optimization of the index use is the same, the use of x BETWEEN y AND z adds both the y <= x and x <= z checks as where clause tests that are executed within the loop, whereas when using the devolved query (the later form) one of the constraints is used against the index and only the other one is tested.  

CREATE TABLE ranges
(
    start integer,
    end integer,
    value integer
);
create index ranges_index on ranges (start, end, value);
insert into ranges (start, end, value)
 select x.value - y.value, x.value + y.value, x.value
   from generate_series as x
   join generate_series as y
  where x.start = 1 and x.stop = 10000
    and y.start = 1 and y.stop = 10000;
-- count of ranges records is  10000000
select count(*) from ranges where 25 between start and end;
-- 50254399
-- Run Time: real 4.860 user 4.859375 sys 0.000000
select count(*) from ranges where start <= 25 and 25 <= end;
-- 50254399
-- Run Time: real 3.982 user 3.984375 sys 0.000000

Thus the extra column load and compare comprises 18% of the time used to execute the query.

It is slightly faster in if the table is a without rowid table, but not significantly (though the space used is halved).  Note however the overhead of using x BETWEEN y AND z -vs- y <= x AND x <= z is now 21% of the query execution time (which is probably not a significant difference) ...

create table ranges
(
     start integer,
     end integer,
     value integer,
     primary key (start, end)
) without rowid;
insert into ranges (start, end, value)
     select x.value - y.value, x.value + y.value, x.value
       from generate_series as x
       join generate_series as y
      where x.start = 1 and x.stop = 10000
        and y.start = 1 and y.stop = 10000;
select count(*) from ranges where 25 between start and end;
-- 50254399
-- Run Time: real 4.854 user 4.843750 sys 0.000000
select count(*) from ranges where start <= 25 and 25 <= end;
-- 50254399
-- Run Time: real 3.814 user 3.812500 sys 0.000000

>And so you might be iterating through a very large set of records
>to extract the ones you want.

>R-tree might work for you:
>
>   https://sqlite.org/rtree.html

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Keith Medcalf
In reply to this post by Dan Kennedy-4




---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


On Thursday, 25 October, 2018 10:48, Dan Kennedy <[hidden email]> wrote:

>On 10/25/2018 11:13 PM, siscia wrote:
>>
>> I am facing an interesting optimization problem.
>>
>> I have a table like this:
>>
>> CREATE TABLE ranges (
>>     start int,
>>     end int,
>>     value int,
>> );
>>
>> The query that I am interested in optimizing is "select value from
>ranges
>> where (? between start and end)"
>>
>> The max performance that I was able to get is 250 results/second
>with a
>> covering index on all three columns.

>And so you might be iterating through a very large set of records
>to extract the ones you want.

>R-tree might work for you:
>   https://sqlite.org/rtree.html

create virtual table ranges using rtree(id, minX, maxX, +value);
insert into ranges (minX, maxX, value)
select x.value - y.value, x.value + y.value, x.value
  from generate_series as x
  join generate_series as y
 where x.start = 1 and x.stop = 10000
   and y.start = 1 and y.stop = 10000;
-- data insertion is about 10 times slower
select count(*) from ranges where 25 between minX and maxX;
-- 50254399
-- Run Time: real 3.473 user 3.468750 sys 0.000000
select count(*) from ranges where minX <= 25 and 25 <= MaxX;
-- 50254399
-- Run Time: real 3.533 user 3.546875 sys 0.000000

Execution time is not significantly quicker, at least for this data ... but the "BETWEEN" version is a tad quicker than the devolved version ...





_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

siscia
In reply to this post by siscia
Hi all,

thanks for your suggestions, unfortunately, I already tried all of them,
except for the rtrees.

Actually, my request for help wasn't complete.

The ranges I am storing in the table are not overlapping.

To make an example in SQL.

The following can be in the dataset:
INSERT INTO ranges(1, 10, 5);
INSERT INTO ranges(15, 29, 8);
INSERT INTO ranges(30, 32, 9);

However, there will never be something like:
INSERT INTO ranges(1, 10, 5);
INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first one

So all the queries are actually:

`SELECT value FROM ranges WHERE (? BETWEEN start AND end) LIMIT 1`

Now suppose there is an index on start and so we are looking for (start < ?)

What happen could be that we begin from (start = 0) and move up to (start <=
?) which is basically a full scan.
Or we could begin from (start <= ?) and move down towards (start = 0) which
would be optimal.

I am afraid that we are hitting the first case, which really is a pity.

Is there a way to suggest to the index how to work on these cases?

Cheers,

Simone



 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Dan Kennedy-4
On 10/26/2018 02:27 PM, siscia wrote:

> Hi all,
>
> thanks for your suggestions, unfortunately, I already tried all of them,
> except for the rtrees.
>
> Actually, my request for help wasn't complete.
>
> The ranges I am storing in the table are not overlapping.
>
> To make an example in SQL.
>
> The following can be in the dataset:
> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(15, 29, 8);
> INSERT INTO ranges(30, 32, 9);
>
> However, there will never be something like:
> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first one
>
> So all the queries are actually:
>
> `SELECT value FROM ranges WHERE (? BETWEEN start AND end) LIMIT 1`
>
> Now suppose there is an index on start and so we are looking for (start < ?)
>
> What happen could be that we begin from (start = 0) and move up to (start <=
> ?) which is basically a full scan.
> Or we could begin from (start <= ?) and move down towards (start = 0) which
> would be optimal.


In SQL, I guess that is:

   SELECT value FROM ranges WHERE (? BETWEEN start AND end)
   ORDER BY start DESC LIMIT 1

Or, perhaps more efficient for the cases where there is no such range:

   SELECT value FROM (
     SELECT value, start, end FROM ranges
     WHERE start <= ?
     ORDER BY start DESC LIMIT 1
   ) WHERE end >= ?

Dan.


>
> I am afraid that we are hitting the first case, which really is a pity.
>
> Is there a way to suggest to the index how to work on these cases?
>
> Cheers,
>
> Simone
>
>
>
>
>
>
>
> --
> Sent from: http://sqlite.1065341.n5.nabble.com/
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

siscia
In reply to this post by siscia
Ok, after the message I thought a little bit more.

And it turns out that in the database the `start`s are not unique how they
should.
Making them unique, seems to solve the performance problem completely.

However, still, I am not sure why the `LIMIT 1` does not help at all.

Can you guys shed some light on this?

Cheers,
Simone



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

siscia
Sorry,

I was a little too optimistic.

Making the starts unique does help only for some queries, not for all.

Why?

Cheers,
Simone



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Olivier Mascia
In reply to this post by siscia
> Le 26 oct. 2018 à 09:27, siscia <[hidden email]> a écrit :
>
> thanks for your suggestions, unfortunately, I already tried all of them,
> except for the rtrees.
>
> Actually, my request for help wasn't complete.
>
> The ranges I am storing in the table are not overlapping.
>
> To make an example in SQL.
>
> The following can be in the dataset:
> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(15, 29, 8);
> INSERT INTO ranges(30, 32, 9);
>
> However, there will never be something like:
> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first one

What if the data was structured differently?

> CREATE TABLE ranges (
>    start int,
>    end int,
>    value int,
> );

becomes:

CREATE TABLE ranges (
   start int,
   range int, -- on the basis that start + range = end
   value int,
);

> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(15, 29, 8);
> INSERT INTO ranges(30, 32, 9);

becomes:

INSERT INTO ranges(1, 9, 5);
INSERT INTO ranges(15, 14, 8);
INSERT INTO ranges(30, 2, 9);

and you have:

CREATE INDEX idx_ranges on ranges(start);

> select value from ranges
> where (? between start and end)

becomes:

SELECT value FROM ranges where (? between start AND start+range);

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Keith Medcalf
In reply to this post by siscia

Based on your assumptions being correct
 (a) start is unique
 (b) start end ranges do not overlap

create table ranges
(
  start integer primary key,
  stop  integer not null,
  value integer not null
);

INSERT INTO ranges values (1, 10, 5);
INSERT INTO ranges values (15, 29, 8);
INSERT INTO ranges values (30, 32, 9);

select value
  from (select stop, value
          from ranges
         where start <= ?1
      order by start desc
         limit 1)
 where ?1 <= stop;

If your data does not meet the constraints you have specified then the query will not work properly.  The resulting value (if there is one) will be returned with a single index lookup and a single comparison.  (Note that you can create a covering index on your existing table if you do not want to remake it).

This works as it does because the answer, if there is one, can only be located on the row where start <= ?1 (for the biggest numerical value of start) and then only if the correspondingly found row also meets the requirement that ?1 <= stop

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import apsw
>>> db = apsw.Connection(':memory:')
>>>
>>> create_sql = """create table ranges
... (
...   start integer primary key,
...   stop  integer not null,
...   value integer not null
... );
...
... INSERT INTO ranges values (1, 10, 5);
... INSERT INTO ranges values (15, 29, 8);
... INSERT INTO ranges values (30, 32, 9);
... """
>>>
>>> sql = """select value
...   from (select stop, value
...           from ranges
...          where start <= ?1
...       order by start desc
...          limit 1)
...  where ?1 <= stop;
... """
>>>
>>> db.execute(create_sql)
<apsw.Cursor object at 0x0000020F7EB583F0>
>>> for row in db.execute('select * from ranges;'):
...  print(row)
...
Row(start=1, stop=10, value=5)
Row(start=15, stop=29, value=8)
Row(start=30, stop=32, value=9)
>>> for i in range(35):
...  for row in db.execute(sql, (i, )):
...   print(i, row)
...
1 Row(value=5)
2 Row(value=5)
3 Row(value=5)
4 Row(value=5)
5 Row(value=5)
6 Row(value=5)
7 Row(value=5)
8 Row(value=5)
9 Row(value=5)
10 Row(value=5)
15 Row(value=8)
16 Row(value=8)
17 Row(value=8)
18 Row(value=8)
19 Row(value=8)
20 Row(value=8)
21 Row(value=8)
22 Row(value=8)
23 Row(value=8)
24 Row(value=8)
25 Row(value=8)
26 Row(value=8)
27 Row(value=8)
28 Row(value=8)
29 Row(value=8)
30 Row(value=9)
31 Row(value=9)
32 Row(value=9)
>>>

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of siscia
>Sent: Friday, 26 October, 2018 01:27
>To: [hidden email]
>Subject: Re: [sqlite] Optmize queries on ranges
>
>Hi all,
>
>thanks for your suggestions, unfortunately, I already tried all of
>them,
>except for the rtrees.
>
>Actually, my request for help wasn't complete.
>
>The ranges I am storing in the table are not overlapping.
>
>To make an example in SQL.
>
>The following can be in the dataset:
>INSERT INTO ranges(1, 10, 5);
>INSERT INTO ranges(15, 29, 8);
>INSERT INTO ranges(30, 32, 9);
>
>However, there will never be something like:
>INSERT INTO ranges(1, 10, 5);
>INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first
>one
>
>So all the queries are actually:
>
>`SELECT value FROM ranges WHERE (? BETWEEN start AND end) LIMIT 1`
>
>Now suppose there is an index on start and so we are looking for
>(start < ?)
>
>What happen could be that we begin from (start = 0) and move up to
>(start <=
>?) which is basically a full scan.
>Or we could begin from (start <= ?) and move down towards (start = 0)
>which
>would be optimal.
>
>I am afraid that we are hitting the first case, which really is a
>pity.
>
>Is there a way to suggest to the index how to work on these cases?
>
>Cheers,
>
>Simone
>
>
>
>
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Keith Medcalf
In reply to this post by siscia

Limit 1 says to stop after returning 1 row.  If the "first row" being searched is not the one containing "the answer" then the search will continue until the row that does not match the index constraint is hit, after which it is known that no answer is possible (without returning a row).


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of siscia
>Sent: Friday, 26 October, 2018 01:49
>To: [hidden email]
>Subject: Re: [sqlite] Optmize queries on ranges
>
>Ok, after the message I thought a little bit more.
>
>And it turns out that in the database the `start`s are not unique how
>they
>should.
>Making them unique, seems to solve the performance problem
>completely.
>
>However, still, I am not sure why the `LIMIT 1` does not help at all.
>
>Can you guys shed some light on this?
>
>Cheers,
>Simone
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

E.Pasma
About the rtree extension, which was the first idea.

The extension appears available without any special installation option. This is easier than what is mentioned in https://sqlite.org/rtree.html <https://sqlite.org/rtree.html> chapter 2: "Compiling The R*Tree Module".
This chapter may as well be left out?

With test data where the ranges are mostly non-overlapping, the query now runs faster than without rtree. Even though both run within a millisecond rtree is ten times faster.
With order by and limit the timing remains superior. But this relies on strictly non-overlapping ranges.
Below my test script


/* query 1: using rtree built-in extension */
;
create virtual table ranges using rtree(id, minX, maxX, +value);
with r as (select 0 as r union all select r+1 from r where r<1000000)
insert into ranges (minX, maxX, value)
select r*10+1,r*10+10,r*10+5 from r
;
select value from ranges where 123456 between minx and maxx
;
123455
Run Time: real 0.000 user 0.000135 sys 0.000018

/* query 2: using index on minx+maxx */
drop table ranges
;
create table ranges (minx int, maxx int, value int)
;
with r as (select 0 as r union all select r+1 from r where r<1000000)
insert into ranges (minX, maxX, value)
select r*10+1,r*10+10,r*10+5 from r
;
create unique index ranges_minx_maxx on ranges(minx,maxx)
;
select value from ranges where 123456 between minx and maxx
;
123455
Run Time: real 0.002 user 0.001415 sys 0.000016

/* query 3: same, assuming non-overlapping ranges */
select value from ranges where 123456 between minx and maxx
order by minx desc limit 1
;
123455
Run Time: real 0.000 user 0.000057 sys 0.000000

 

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Keith Medcalf

You also need to make sure the "no hit" does not degenerate into a table scan.  RTree works well for this but is overall significantly slower than not using RTree since the purpose of RTree is to find the "small number of candidate records" that could possibly satisfy the query out of a haystack of records (that is, find the candidate needles in the haystack, so that you only need to closely examine that small number of candidates to find the needle rather than test the whole haystack).  

However, if you know that there can only be one possible record which can satisfy the query (ie, there is only one possible needle in the haystack, and only one possible candidate, and you can find this candidate directly for testing), then the overhead of using RTree where it is not needed exceeds the benefits of using it.

I see that the performance of the RTree is significantly slower than the equivalent "direct" method.  Am I doing something wrong here or is that overhead simply because of the data structures that the RTRee implementation must maintain (which are not required in this case).

Without RTree:
>py -3 test.py
Created 100000 random ranges in 00:00:00.681118 Creation Rate = 146817 Ranges/Second
Looked up 1102019 random range values in 00:00:04.598245 Lookup Rate = 239660 Values/Second
Failure Rate = 257270 Values/Second
Success Rate = 228828 Values/Second

With RTree:
>py -3 test.py --rtree
Created 100000 random ranges in 00:00:02.139742 Creation Rate = 46734 Ranges/Second
Looked up 1100681 random range values in 00:00:13.662556 Lookup Rate = 80561 Values/Second
Failure Rate = 119874 Values/Second
Success Rate = 65627 Values/Second

And that came from the following test program (in python) where the only difference is the SQL statements being used.  Because the ranges are random and the lookups are random, the timings given are subject to differences on every run, however, the averaged rates are relatively stable given a large number of random ranges and query values.

--- test.py ---
from __future__ import absolute_import, division, print_function, unicode_literals

import datetime
import random
import sys
import time

import sqlite3

# Convert a value in seconds to HMS format
HMS = lambda t: (datetime.datetime.min + datetime.timedelta(seconds=t)).time().isoformat()

# Create constants for the SQL statements we will use

if '--rtree' in sys.argv:
    create_sql = 'create virtual table ranges using rtree(id, start, stop, +value);'
    query_sql = 'select value from ranges where ? between start and stop;'
else:
    create_sql = 'create table ranges (start integer primary key, stop integer not null, value integer not null);'
    query_sql = 'select value from (select stop, value from ranges where start <= ?1 order by start desc limit 1) where ?1 <= stop;'

insert_sql = 'insert into ranges (start, stop, value) values (?, ?, ?);'


# Open our database and do not use automagical transactions
db = sqlite3.connect(':memory:', isolation_level=None)

# Create our table
db.execute(create_sql)

# Create our random range data
recs = 100000
start = 0
st = time.time()
for cnt in range(recs):
    start += random.randint(1, 10)
    stop = start + random.randint(1, 10)
    value = int((start + stop) / 2)
    db.execute(insert_sql, (start, stop, value))
    start = stop
stop = stop + random.randint(1, 10)
et = time.time() - st
print('Created', recs, 'random ranges in', HMS(et), 'Creation Rate =', int(recs / et), 'Ranges/Second')

db.execute('analyze;')

# Generate a bunch of random values and perform the range query
eta = 0.0
ets = 0.0
etf = 0.0
fcnt = 0
scnt = 0
tcnt = 0
for i in range(stop):
    x = random.randint(0, stop)
    lst = time.time()
    row = db.execute(query_sql, (x, )).fetchone()
    let = time.time() - lst
    if row:
        value = row[0]
        ets += let
        scnt += 1
    else:
        value = None
        etf += let
        fcnt += 1
    eta += let
    tcnt += 1
print('Looked up', stop, 'random range values in', HMS(eta), 'Lookup Rate =', int(tcnt / eta), 'Values/Second')
print('Failure Rate =', int(fcnt / etf), 'Values/Second')
print('Success Rate =', int(scnt / ets), 'Values/Second')

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of E.Pasma
>Sent: Friday, 26 October, 2018 16:28
>To: SQLite mailing list
>Subject: Re: [sqlite] Optmize queries on ranges
>
>About the rtree extension, which was the first idea.
>
>The extension appears available without any special installation
>option. This is easier than what is mentioned in
>https://sqlite.org/rtree.html <https://sqlite.org/rtree.html> chapter
>2: "Compiling The R*Tree Module".
>This chapter may as well be left out?
>
>With test data where the ranges are mostly non-overlapping, the query
>now runs faster than without rtree. Even though both run within a
>millisecond rtree is ten times faster.
>With order by and limit the timing remains superior. But this relies
>on strictly non-overlapping ranges.
>Below my test script
>
>
>/* query 1: using rtree built-in extension */
>;
>create virtual table ranges using rtree(id, minX, maxX, +value);
>with r as (select 0 as r union all select r+1 from r where r<1000000)
>insert into ranges (minX, maxX, value)
>select r*10+1,r*10+10,r*10+5 from r
>;
>select value from ranges where 123456 between minx and maxx
>;
>123455
>Run Time: real 0.000 user 0.000135 sys 0.000018
>
>/* query 2: using index on minx+maxx */
>drop table ranges
>;
>create table ranges (minx int, maxx int, value int)
>;
>with r as (select 0 as r union all select r+1 from r where r<1000000)
>insert into ranges (minX, maxX, value)
>select r*10+1,r*10+10,r*10+5 from r
>;
>create unique index ranges_minx_maxx on ranges(minx,maxx)
>;
>select value from ranges where 123456 between minx and maxx
>;
>123455
>Run Time: real 0.002 user 0.001415 sys 0.000016
>
>/* query 3: same, assuming non-overlapping ranges */
>select value from ranges where 123456 between minx and maxx
>order by minx desc limit 1
>;
>123455
>Run Time: real 0.000 user 0.000057 sys 0.000000
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

E.Pasma

> Keith Medcalf wrote:
>  .. Am I doing something wrong here ..

No! The query with order by + limit 1 is superior, also in my test.

Still I am surprised that the rtree extension is available by default
(at least in the sqlite version 3.25 command line)


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optmize queries on ranges

Jens Alfke-2
In reply to this post by Keith Medcalf


> On Oct 25, 2018, at 10:45 AM, Keith Medcalf <[hidden email]> wrote:
>
> There is an extra column load and compare when using the between version of the query (this is because although the optimization of the index use is the same, the use of x BETWEEN y AND z adds both the y <= x and x <= z checks as where clause tests that are executed within the loop, whereas when using the devolved query (the later form) one of the constraints is used against the index and only the other one is tested.  

This seems like an optimization opportunity … is it already a known issue, to be addressed in the query optimizer at some point?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users