General R*Tree query

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

General R*Tree query

Brian T. Carcich
I'm working on an SQLite solution to get at star catalogs; they are usually
searched via Right Ascension (RA), Declination (DEC), and magnitude (mag).
 RA,DEC is a spherical coordinate system to specify a star position on-sky;
magnitude is related to star brightness.

What I have so far is here:

   https://github.com/drbitboy/Tycho2_SQLite_RTree


I started with the Tycho-2 star catalog.  It comprises 2.5 million stars in
a flat ASCII, fixed-width catalog file (actually two files but ignore that
for now), and an index file (map) of ~10k small RA-DEC regions, with an
average of ~250 stars in each region.  The regions do not overlap, and all
the stars in any one region are in contiguous lines in the catalog file.

The index file does not implement any grouping or sorting by magnitude.
 Each index region refers to

A) a contiguous region on-sky with defined by a min-max RA pair and a
min-max DEC pair.

B)  a contiguous range of the lines (stars) in the flat file that are
within that region.

So the data in the index file are a reasonable starting point for an R*Tree
in SQLite3.  I put the index file data into the virtual table using the RA
and DEC limits for each region as the two min-max pairs of columns in the
table, and the index table, referenced by the primary key of the virtual
table, contains the starting and ending+1 indices (offsets actually) of the
stars in the flat catalog file for each region.

So I use the R*Tree module to get a fast lookup into the index table,
returning index regions that overlap an input RA and DEC min-max pair, then
step through the catalog lines for each of those regions.

Here's my question:  is there any advantage to skipping the index step and
putting the star catalog data into the virtual table itself?  One advantage
is that I could include the magnitude in the rtree table.

The reason I ask is that rtree table uses min-max pairs, but each star is a
point so the min and max are equal for each star.  Would that break any
implicit R*Tree rules or negate any efficiencies?

Thanks,

Brian Carcich
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: General R*Tree query

Dan Kennedy-4
On 12/18/2013 12:49 AM, Brian T. Carcich wrote:

> I'm working on an SQLite solution to get at star catalogs; they are usually
> searched via Right Ascension (RA), Declination (DEC), and magnitude (mag).
>   RA,DEC is a spherical coordinate system to specify a star position on-sky;
> magnitude is related to star brightness.
>
> What I have so far is here:
>
>     https://github.com/drbitboy/Tycho2_SQLite_RTree
>
>
> I started with the Tycho-2 star catalog.  It comprises 2.5 million stars in
> a flat ASCII, fixed-width catalog file (actually two files but ignore that
> for now), and an index file (map) of ~10k small RA-DEC regions, with an
> average of ~250 stars in each region.  The regions do not overlap, and all
> the stars in any one region are in contiguous lines in the catalog file.
>
> The index file does not implement any grouping or sorting by magnitude.
>   Each index region refers to
>
> A) a contiguous region on-sky with defined by a min-max RA pair and a
> min-max DEC pair.
>
> B)  a contiguous range of the lines (stars) in the flat file that are
> within that region.
>
> So the data in the index file are a reasonable starting point for an R*Tree
> in SQLite3.  I put the index file data into the virtual table using the RA
> and DEC limits for each region as the two min-max pairs of columns in the
> table, and the index table, referenced by the primary key of the virtual
> table, contains the starting and ending+1 indices (offsets actually) of the
> stars in the flat catalog file for each region.
>
> So I use the R*Tree module to get a fast lookup into the index table,
> returning index regions that overlap an input RA and DEC min-max pair, then
> step through the catalog lines for each of those regions.
>
> Here's my question:  is there any advantage to skipping the index step and
> putting the star catalog data into the virtual table itself?  One advantage
> is that I could include the magnitude in the rtree table.
>
> The reason I ask is that rtree table uses min-max pairs, but each star is a
> point so the min and max are equal for each star.  Would that break any
> implicit R*Tree rules or negate any efficiencies?

Points are fine. So long as (max>=min) for all dimensions.
R-tree will return SQLITE_CONSTRAINT if you try to insert
a record for which this is not the case.

I guess in theory storing each individual star in the
r-tree might be more efficient. Difficult to say if it
would be significant though. The r-tree and other virtual
table code is not as optimized as the core, so even if
it seems better in theory it might not be in practice.

Adding the magnitude to the r-tree structure as an extra
dimension that is never queried sounds like it might make
the r-tree structure less efficient though. R-tree will
make some effort to place records with similar magnitudes
on the same node, which won't help the query but will
presumably cause some reduction in the localization that
does matter.

Is it working now? How many more stars do you have data for?

Dan.





_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: General R*Tree query

Roman Fleysher

Since coordinate system is spherical, how do you tell that RA=23:59 and RA=00:01 are next to each other using usual comparisons?

Roman

________________________________________
From: [hidden email] [[hidden email]] on behalf of Dan Kennedy [[hidden email]]
Sent: Tuesday, December 17, 2013 3:51 PM
To: [hidden email]
Subject: Re: [sqlite] General R*Tree query

On 12/18/2013 12:49 AM, Brian T. Carcich wrote:

> I'm working on an SQLite solution to get at star catalogs; they are usually
> searched via Right Ascension (RA), Declination (DEC), and magnitude (mag).
>   RA,DEC is a spherical coordinate system to specify a star position on-sky;
> magnitude is related to star brightness.
>
> What I have so far is here:
>
>     https://github.com/drbitboy/Tycho2_SQLite_RTree
>
>
> I started with the Tycho-2 star catalog.  It comprises 2.5 million stars in
> a flat ASCII, fixed-width catalog file (actually two files but ignore that
> for now), and an index file (map) of ~10k small RA-DEC regions, with an
> average of ~250 stars in each region.  The regions do not overlap, and all
> the stars in any one region are in contiguous lines in the catalog file.
>
> The index file does not implement any grouping or sorting by magnitude.
>   Each index region refers to
>
> A) a contiguous region on-sky with defined by a min-max RA pair and a
> min-max DEC pair.
>
> B)  a contiguous range of the lines (stars) in the flat file that are
> within that region.
>
> So the data in the index file are a reasonable starting point for an R*Tree
> in SQLite3.  I put the index file data into the virtual table using the RA
> and DEC limits for each region as the two min-max pairs of columns in the
> table, and the index table, referenced by the primary key of the virtual
> table, contains the starting and ending+1 indices (offsets actually) of the
> stars in the flat catalog file for each region.
>
> So I use the R*Tree module to get a fast lookup into the index table,
> returning index regions that overlap an input RA and DEC min-max pair, then
> step through the catalog lines for each of those regions.
>
> Here's my question:  is there any advantage to skipping the index step and
> putting the star catalog data into the virtual table itself?  One advantage
> is that I could include the magnitude in the rtree table.
>
> The reason I ask is that rtree table uses min-max pairs, but each star is a
> point so the min and max are equal for each star.  Would that break any
> implicit R*Tree rules or negate any efficiencies?

Points are fine. So long as (max>=min) for all dimensions.
R-tree will return SQLITE_CONSTRAINT if you try to insert
a record for which this is not the case.

I guess in theory storing each individual star in the
r-tree might be more efficient. Difficult to say if it
would be significant though. The r-tree and other virtual
table code is not as optimized as the core, so even if
it seems better in theory it might not be in practice.

Adding the magnitude to the r-tree structure as an extra
dimension that is never queried sounds like it might make
the r-tree structure less efficient though. R-tree will
make some effort to place records with similar magnitudes
on the same node, which won't help the query but will
presumably cause some reduction in the localization that
does matter.

Is it working now? How many more stars do you have data for?

Dan.





_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: General R*Tree query

Brian T. Carcich
In reply to this post by Dan Kennedy-4
On Tue, Dec 17, 2013 at 3:51 PM, Dan Kennedy <[hidden email]> wrote:

> On 12/18/2013 12:49 AM, Brian T. Carcich wrote:
>
>> [...]
>
> Points are fine. [...]
> Is it working now? How many more stars do you have data for?


Excellent, thanks for the info!

I forgot to mention that we do perform searches using magnitude.

Yes it is working now; I do the normal SQLite3 R*Tree INNER JOIN to get to
the index table (tyc2index) from the indexrtree table (tyc2indexrtree)
regions overlapping the user-supplied RA,DEC limits (hira = High RA limit;
lodec = Low DEC limit; etc), and then do another INNER JOIN ON the index
table start and end offsets with the offsets in the main catalog table
(tyc2catalog_uvs), so it all happens in one call.  The beauty is that all
the work is done up front when I load the data from the star catalog, and
then the SELECT does the rest.  Also, the approach should work for any
catalog that has RA,DEC and Magnitude, which almost all catalogs do.

I think the SELECT is in the Githup repo ... yeah, here it is:


SELECT tyc2catalog_uvs.offset ,tyc2catalog_uvs.x ,tyc2catalog_uvs.y
> ,tyc2catalog_uvs.z ,tyc2catalog_uvs.mag


> FROM tyc2indexrtree



INNER JOIN tyc2index
>         ON tyc2indexrtree.offset=tyc2index.offset



INNER JOIN tyc2catalog_uvs
>         ON tyc2index.catalogstart<=tyc2catalog_uvs.offset
>        AND tyc2index.catalogend>tyc2catalog_uvs.offset
>        AND tyc2catalog_uvs.mag<?



WHERE tyc2indexrtree.offset=tyc2index.offset
>   AND tyc2indexrtree.hira>?
>   AND tyc2indexrtree.lora<?
>   AND tyc2indexrtree.hidec>?
>   AND tyc2indexrtree.lodec<?


> ORDER BY tyc2catalog_uvs.mag asc;
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: General R*Tree query

Brian T. Carcich
In reply to this post by Roman Fleysher
On Tue, Dec 17, 2013 at 3:57 PM, Roman Fleysher <
[hidden email]> wrote:

>
> Since coordinate system is spherical, how do you tell that RA=23:59 and
> RA=00:01 are next to each other using usual comparisons?


I don't; usual comparisons won't work so I do two comparisons:

I am usually looking for stars within a convex field of view (FOV),
typically a frustum with a rectangular footprint, so I determine if and
where RA=0=360 crosses that footprint, and break the FOV into two pieces,
from 0<=RA<=loRA and hiRA<=RA<=360, so loRA becomes hira in one search and
hiRA become lora in the other.

There are only three cases:  zero, one or two crossings.  Zero crossings
means I can do everything in one SELECT; one crossing means either one of
the poles is in the FOV and I search RA=0 to 360; DEC=someDEC to +/-90, or
the FOV touches RA=0(=360) from one side or the other, which reduces to the
zero case; two crossings means the poles are not in the FOV and I can do
two searches as mentioned above, from 0 up to someLowRA and from 360 down
to someHighRA.

There are some edge cases but that is basically it.

I actually handle "two or more crossings" cases the same as two cases, even
though I don't think "more" can happen with a convex FOV footprint.  For
any edge (segment of the great circle between two vertices) of the FOV that
crosses RA=0, which is easily determined since I have the vertices in XYZ
coordinates, I insert a vertex in the edge at the crossing, and then
recurse with subsets of vertices split across RA=0.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: General R*Tree query

Roman Fleysher
Perhaps this is a weird way for me to get deeper knowledge of R trees, and because I vaguely remember that Tyco refers to a specific epoch in which coordinates are defined, but would it be possible to search R tree using a cone, i.e. stars within a cone of certain degree around given star? This would require a trigonometric calculation before comparison can be made but can be done in a single comparison.

Or, since RA and DEC coordinates are not area preserving (nor distance) -- i.e. angle between stars at DEC =0 is bigger than angle between stars at DEC=80 when they are the same delta RA apart -- then maybe instead of defining rectangular FOV in RA and DEC one should be defining rectangular FOV in DEC, sin(RA)? Then one would not need two searches.

The goal is to find neighbors to a given star defined roughly by some metric? Since there's nothing magical in RA , DEC coordinates the metric could use some other coordinates?

Roman





-------- Original message --------
From: "Brian T. Carcich" <[hidden email]>
Date: 12/18/2013 1:24 AM (GMT-05:00)
To: General Discussion of SQLite Database <[hidden email]>
Subject: Re: [sqlite] General R*Tree query


On Tue, Dec 17, 2013 at 3:57 PM, Roman Fleysher <
[hidden email]> wrote:

>
> Since coordinate system is spherical, how do you tell that RA=23:59 and
> RA=00:01 are next to each other using usual comparisons?


I don't; usual comparisons won't work so I do two comparisons:

I am usually looking for stars within a convex field of view (FOV),
typically a frustum with a rectangular footprint, so I determine if and
where RA=0=360 crosses that footprint, and break the FOV into two pieces,
from 0<=RA<=loRA and hiRA<=RA<=360, so loRA becomes hira in one search and
hiRA become lora in the other.

There are only three cases:  zero, one or two crossings.  Zero crossings
means I can do everything in one SELECT; one crossing means either one of
the poles is in the FOV and I search RA=0 to 360; DEC=someDEC to +/-90, or
the FOV touches RA=0(=360) from one side or the other, which reduces to the
zero case; two crossings means the poles are not in the FOV and I can do
two searches as mentioned above, from 0 up to someLowRA and from 360 down
to someHighRA.

There are some edge cases but that is basically it.

I actually handle "two or more crossings" cases the same as two cases, even
though I don't think "more" can happen with a convex FOV footprint.  For
any edge (segment of the great circle between two vertices) of the FOV that
crosses RA=0, which is easily determined since I have the vertices in XYZ
coordinates, I insert a vertex in the edge at the crossing, and then
recurse with subsets of vertices split across RA=0.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: General R*Tree query

Brian T. Carcich
On Wed, Dec 18, 2013 at 1:53 AM, Roman Fleysher <
[hidden email]> wrote:

> Perhaps this is a weird way for me to get deeper knowledge of R trees, and
> because I vaguely remember that Tyco refers to a specific epoch in which
> coordinates are defined, but would it be possible to search R tree using a
> cone, i.e. stars within a cone of certain degree around given star? This
> would require a trigonometric calculation before comparison can be made but
> can be done in a single comparison.
>
> Or, since RA and DEC coordinates are not area preserving (nor distance) --
> i.e. angle between stars at DEC =0 is bigger than angle between stars at
> DEC=80 when they are the same delta RA apart -- then maybe instead of
> defining rectangular FOV in RA and DEC one should be defining rectangular
> FOV in DEC, sin(RA)? Then one would not need two searches.
>
> The goal is to find neighbors to a given star defined roughly by some
> metric? Since there's nothing magical in RA , DEC coordinates the metric
> could use some other coordinates?



Every [RA,DEC] pair resolves to a unit vector in Cartesian coordinate space
i.e. an [X,Y,Z] triplet on the surface of a unit sphere; that would be a
continuous metric without the RA=0=360 issue.  I don't see why the R*Tree
could not be set up with X, Y, and Z, plus magnitude limits; the set of
nodes is hollow in a 3D sense so the first-level non-leaf nodes would have
a lot of empty space, but I don't think that matters; I've been thinking
about doing it this way for some time.  For my app I already store XYZs in
the outer, non-R*Tree table because all final comparisons have to be in
Cartesian space anyway.

But in general the search region is so small that the cosine[DEC]
dependence of distance per degree of RA is effectively constant for any one
search, and  an [RA,DEC,Mag] tree should be "good enough" because it pares
down the search space quickly from 2.5M stars in Tycho=2 to a few hundred
very quickly, with the caveat that there is a special case near RA=0=360.

In any event the R*Tree is not going to do the final geometric comparison,
rather it reduces the number of stars that need that comparison, and that's
the goal.

Btw, if you want to see something cool, look at chapter 4 of Dustin Lang's
thesis (www.astro.princeton.edu/~dstn/lang-thesis.pdf); the whole paper is
cool, but I really enjoy that chapter.

-b
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users