help with a complicated join of two tables

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

help with a complicated join of two tables

Puneet Kishor-2
Apologies in advance for a terrible subject line -- I didn't know quite how to phrase it better.

I have the following two tables (with sample data)

CREATE TABLE geo (
        id INTEGER PRIMARY KEY,
        max_age TEXT,
        min_age TEXT
);

geo table: 39K rows
id max_age min_age
--- ------- --------
1 Holocene Holocene
5 Cambrian Silurian
12 Cambrian Ordovician
229 Cretaceous Quaternary

CREATE TABLE intervals (
        id INTEGER PRIMARY KEY,
        age_bottom REAL,
        age_top REAL,
        name TEXT,
        color TEXT
);

intervals table: ~450 rows
id age_bottom age_top name color
--- ----------  ------- ----------- -------
3 0.0117 0.0000 Holocene #FEF2E0
105 443.7000 416.0000 Silurian #B3E1B6
112 488.3000 443.7000 Ordovician #009270
421 2.5880 0.0000 Quaternary #F9F97F
122 542.0000 488.3000 Cambrian #7FA056
33 145.5000 65.5000 Cretaceous #7FC64E

Keep in mind, max_age is older than min_age, and age_bottom is older than age_top.

The table geo can also have rows with min_age = max_age. I want a result set with geo.id, min_age, max_age, age_bottom, age_top, name, color like so:

- every row should be for one and only one geo record. I have 39K rows in "geo" table, so the result set should have 39K rows.

- when min_age = max_age, list the corresponding intervals.name and color

- when min_age != max_age, find the interval with the smallest different between age_bottom and age_top that would span min_age and max_age of geo. In other words, the interval whose age_bottom is bigger than the age_bottom of the max_age and whose age_top is smaller than the age_top of the min_age.

I've gotten so far

SELECT g.id, max_age, min_age,
        i.age_bottom, i.age_top, i.interval_name, i.interval_color
FROM geo g
  LEFT JOIN intervals i_max ON g.max_age = i_max.interval_name
  LEFT JOIN intervals i_min ON g.min_age = i_min.interval_name
  JOIN intervals i ON
  i.age_bottom >= i_max.age_bottom AND
  i.age_top <= i_min.age_top
WHERE g.id = ?
ORDER BY i.age_bottom - i.age_top
LIMIT 1;

The above query finds the correct values for a single g.id, but I want a result set with all the rows.

Suggestions?

Puneet.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Igor Tandetnik
Mr. Puneet Kishor <[hidden email]> wrote:

> geo table: 39K rows
> id max_age min_age
> --- ------- --------
> 1 Holocene Holocene
> 5 Cambrian Silurian
> 12 Cambrian Ordovician
> 229 Cretaceous Quaternary
>
> intervals table: ~450 rows
> id age_bottom age_top name color
> --- ----------  ------- ----------- -------
> 3 0.0117 0.0000 Holocene #FEF2E0
> 105 443.7000 416.0000 Silurian #B3E1B6
> 112 488.3000 443.7000 Ordovician #009270
> 421 2.5880 0.0000 Quaternary #F9F97F
> 122 542.0000 488.3000 Cambrian #7FA056
> 33 145.5000 65.5000 Cretaceous #7FC64E
>
> Keep in mind, max_age is older than min_age, and age_bottom is older than age_top.
>
> The table geo can also have rows with min_age = max_age. I want a result set with geo.id, min_age, max_age, age_bottom, age_top,
> name, color like so:
>
> - every row should be for one and only one geo record. I have 39K rows in "geo" table, so the result set should have 39K rows.
>
> - when min_age = max_age, list the corresponding intervals.name and color
>
> - when min_age != max_age, find the interval with the smallest different between age_bottom and age_top that would span min_age
> and max_age of geo. In other words, the interval whose age_bottom is bigger than the age_bottom of the max_age and whose age_top
> is smaller than the age_top of the min_age.  

I'm not sure I understand. Let's take geo.id = 5, max_age=Cambrian, min_age=Silurian. You say you want a record whose age_bottom is greater than that corresponding to Cambrian, that is 542.0000; and whose age_top is smaller than that corresponding to Silurian, or 416.0000. I don't seem to see any such record in your example.
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Puneet Kishor-2

On Sep 11, 2011, at 9:58 PM, Igor Tandetnik wrote:

> Mr. Puneet Kishor <[hidden email]> wrote:
>> geo table: 39K rows
>> id max_age min_age
>> --- ------- --------
>> 1 Holocene Holocene
>> 5 Cambrian Silurian
>> 12 Cambrian Ordovician
>> 229 Cretaceous Quaternary
>>
>> intervals table: ~450 rows
>> id age_bottom age_top name color
>> --- ----------  ------- ----------- -------
>> 3 0.0117 0.0000 Holocene #FEF2E0
>> 105 443.7000 416.0000 Silurian #B3E1B6
>> 112 488.3000 443.7000 Ordovician #009270
>> 421 2.5880 0.0000 Quaternary #F9F97F
>> 122 542.0000 488.3000 Cambrian #7FA056
>> 33 145.5000 65.5000 Cretaceous #7FC64E
>>
>> Keep in mind, max_age is older than min_age, and age_bottom is older than age_top.
>>
>> The table geo can also have rows with min_age = max_age. I want a result set with geo.id, min_age, max_age, age_bottom, age_top,
>> name, color like so:
>>
>> - every row should be for one and only one geo record. I have 39K rows in "geo" table, so the result set should have 39K rows.
>>
>> - when min_age = max_age, list the corresponding intervals.name and color
>>
>> - when min_age != max_age, find the interval with the smallest different between age_bottom and age_top that would span min_age
>> and max_age of geo. In other words, the interval whose age_bottom is bigger than the age_bottom of the max_age and whose age_top
>> is smaller than the age_top of the min_age.  
>
> I'm not sure I understand. Let's take geo.id = 5, max_age=Cambrian, min_age=Silurian. You say you want a record whose age_bottom is greater than that corresponding to Cambrian, that is 542.0000; and whose age_top is smaller than that corresponding to Silurian, or 416.0000. I don't seem to see any such record in your example.


Hi Igor, I muddied the issue by providing limited sample data that doesn't have an answer that fits my needs. Needless to say, in the complete tables there would be answers. My (slightly simplified) query below shows the kind of result I want

SELECT g.id, max_age, min_age, Min(i.age_bottom - i.age_top) age_range, i.interval_name, i.interval_color
FROM geo g
 LEFT JOIN intervals i_max ON g.max_age = i_max.interval_name
 LEFT JOIN intervals i_min ON g.min_age = i_min.interval_name
 JOIN intervals i ON
  i.age_bottom >= i_max.age_bottom AND
  i.age_top <= i_min.age_top
GROUP BY g.gid, g.max_age, g.min_age, i.interval_name, i.interval_color
ORDER BY g.gid, age_range;

produces the following ---

g.id    max_age         min_age         age_range       interval_name   interval_color
------- --------------- --------------- --------------- --------------- --------------
1 Paleozoic Paleozoic 291.0000 Paleozoic #99C08D
1 Paleozoic Paleozoic 542.0000 Phanerozoic #9AD9DD
4 Precambrian Precambrian 3458.0000 Precambrian #F04370
5 Cambrian Silurian 291.0000 Paleozoic #99C08D
5 Cambrian Silurian 542.0000 Phanerozoic #9AD9DD
6 Silurian Silurian 27.7000 Silurian #B3E1B6
6 Silurian Silurian 291.0000 Paleozoic #99C08D
..
94K+ rows

What I want from the above table is only the first row of each g.id group because that has the interval that has the smallest "age_range" that spans the "max_age" and "min_age"

Sorry, I am not able to articulate this more clearly, but I hope the above example makes things clearer.

Puneet.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Igor Tandetnik
Mr. Puneet Kishor <[hidden email]> wrote:

>>> The table geo can also have rows with min_age = max_age. I want a result set with geo.id, min_age, max_age, age_bottom, age_top,
>>> name, color like so:
>>>
>>> - every row should be for one and only one geo record. I have 39K rows in "geo" table, so the result set should have 39K rows.
>>>
>>> - when min_age = max_age, list the corresponding intervals.name and color
>>>
>>> - when min_age != max_age, find the interval with the smallest different between age_bottom and age_top that would span min_age
>>> and max_age of geo. In other words, the interval whose age_bottom is bigger than the age_bottom of the max_age and whose age_top
>>> is smaller than the age_top of the min_age.

Something like this:

select geo.id, min_age, max_age, age_bottom, age_top, name, color
from geo left join intervals i on i.id = (
    select id from intervals
    where age_bottom >=
        (select age_bottom from intervals where name = geo.max_age)
    and age_top <=
        (select age_top from intervals where name = geo.min_age)
    order by (age_bottom - age_top) limit 1
);

--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Puneet Kishor-2

On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:

> Mr. Puneet Kishor <[hidden email]> wrote:
>>>> The table geo can also have rows with min_age = max_age. I want a result set with geo.id, min_age, max_age, age_bottom, age_top,
>>>> name, color like so:
>>>>
>>>> - every row should be for one and only one geo record. I have 39K rows in "geo" table, so the result set should have 39K rows.
>>>>
>>>> - when min_age = max_age, list the corresponding intervals.name and color
>>>>
>>>> - when min_age != max_age, find the interval with the smallest different between age_bottom and age_top that would span min_age
>>>> and max_age of geo. In other words, the interval whose age_bottom is bigger than the age_bottom of the max_age and whose age_top
>>>> is smaller than the age_top of the min_age.
>
> Something like this:
>
> select geo.id, min_age, max_age, age_bottom, age_top, name, color
> from geo left join intervals i on i.id = (
>    select id from intervals
>    where age_bottom >=
>        (select age_bottom from intervals where name = geo.max_age)
>    and age_top <=
>        (select age_top from intervals where name = geo.min_age)
>    order by (age_bottom - age_top) limit 1
> );
>


Thanks Igor. The above does work and produces the correct result. The query speed, however, is pretty slow ~ 75 seconds. So, I created indexes on intervals.name, geo.max_age, and geo.min_age, and that brought the query time to ~ 11 seconds. Still too slow. So, I created an intermediate table to hold the results, and the speed is more satisfactory.

Many thanks again.

Puneet.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Igor Tandetnik
On 9/12/2011 12:02 PM, Mr. Puneet Kishor wrote:

>
> On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:
>> Something like this:
>>
>> select geo.id, min_age, max_age, age_bottom, age_top, name, color
>> from geo left join intervals i on i.id = (
>>     select id from intervals
>>     where age_bottom>=
>>         (select age_bottom from intervals where name = geo.max_age)
>>     and age_top<=
>>         (select age_top from intervals where name = geo.min_age)
>>     order by (age_bottom - age_top) limit 1
>> );
>>
>
>
> Thanks Igor. The above does work and produces the correct result. The
> query speed, however, is pretty slow ~ 75 seconds. So, I created
> indexes on intervals.name, geo.max_age, and geo.min_age, and that
> brought the query time to ~ 11 seconds. Still too slow.

Indexes on geo.max_age and min_age are unlikely to help with this query
(use EXPLAIN QUERY PLAN to see which indexes are actually used). An
index on intervals.age_bottom might. So would an index on
intervals.age_top (but not both at the same time).
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Jan Hudec
On Mon, Sep 12, 2011 at 12:16:55 -0400, Igor Tandetnik wrote:

> On 9/12/2011 12:02 PM, Mr. Puneet Kishor wrote:
> >
> >On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:
> >>Something like this:
> >>
> >>select geo.id, min_age, max_age, age_bottom, age_top, name, color
> >>from geo left join intervals i on i.id = (
> >>    select id from intervals
> >>    where age_bottom>=
> >>        (select age_bottom from intervals where name = geo.max_age)
> >>    and age_top<=
> >>        (select age_top from intervals where name = geo.min_age)
> >>    order by (age_bottom - age_top) limit 1
> >>);
> >>
> >
> >
> >Thanks Igor. The above does work and produces the correct result. The
> >query speed, however, is pretty slow ~ 75 seconds. So, I created
> >indexes on intervals.name, geo.max_age, and geo.min_age, and that
> >brought the query time to ~ 11 seconds. Still too slow.
>
> Indexes on geo.max_age and min_age are unlikely to help with this
> query (use EXPLAIN QUERY PLAN to see which indexes are actually
> used). An index on intervals.age_bottom might. So would an index on
> intervals.age_top (but not both at the same time).

Looking at the query I'd say the needed indices are:

 create index intervals_name on intervals (name)
 create index intervals_ages on intervals (age_bottom, age_top)

separate indices on age_bottom and age_top are much less useful to this
query.

--
                                                 Jan 'Bulb' Hudec <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Igor Tandetnik
On 9/14/2011 2:07 PM, Jan Hudec wrote:

>>> On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:
>>>> select geo.id, min_age, max_age, age_bottom, age_top, name, color
>>> >from geo left join intervals i on i.id = (
>>>>     select id from intervals
>>>>     where age_bottom>=
>>>>         (select age_bottom from intervals where name = geo.max_age)
>>>>     and age_top<=
>>>>         (select age_top from intervals where name = geo.min_age)
>>>>     order by (age_bottom - age_top) limit 1
>>>> );
>
> Looking at the query I'd say the needed indices are:
>
>   create index intervals_name on intervals (name)
>   create index intervals_ages on intervals (age_bottom, age_top)
>
> separate indices on age_bottom and age_top are much less useful to this
> query.

I don't believe an index on intervals(age_bottom, age_top) can be used
here. Or rather, it can be, but really only one half of it, to satisfy
age_bottom>=X condition. The other half, age_top<=Y, needs to be
satisfied with a linear scan. That's why I said that an index on
intervals(age_bottom) would help, or one on intervals(age_top), but not
both at the same time.

Think about it this way. You have a phone book, where names are sorted
by last name, then first name. You want to find all people whose last
name is greater than 'Smith' and first name less than 'John'. The
alphabetic order helps you with the first half, but not really with the
second half - names satisfying both conditions don't appear sequentially
in the list.
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: help with a complicated join of two tables

Petite Abeille-2

On Sep 14, 2011, at 8:40 PM, Igor Tandetnik wrote:

> Think about it this way. You have a phone book, where names are sorted by last name, then first name. You want to find all people whose last name is greater than 'Smith' and first name less than 'John'. The alphabetic order helps you with the first half, but not really with the second half - names satisfying both conditions don't appear sequentially in the list.

Ah, yes, I wish SQLite supported bitmap indices :))

http://en.wikipedia.org/wiki/Bitmap_index

Bitmap Index vs. B-tree Index: Which and When?
http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users