Managing trees in the database

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Managing trees in the database

SanjayK
Since SQLite is perfect for use in single-user desktop utility applications and since such applications typically store hierarchial data (tree) in a single table, it would be nice to have support for special features like connect by of oracle.

See:
  http://www.adp-gmbh.ch/ora/sql/connect_by.html
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Philipp Knüsel
SanjayK schrieb:

> Since SQLite is perfect for use in single-user desktop utility applications
> and since such applications typically store hierarchial data (tree) in a
> single table, it would be nice to have support for special features like
> connect by of oracle.
>
> See:
>   http://www.adp-gmbh.ch/ora/sql/connect_by.html
> --
> View this message in context: http://www.nabble.com/Managing-trees-in-the-database-t1135555.html#a2974277
> Sent from the SQLite forum at Nabble.com.
>
>  
Depending on your goals, this concept might give you another solution:

http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

As the code examples are for mysql, you need to adapt it a bit.

HTH

Philipp
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Dennis Cote
Philipp Knüsel wrote:

> SanjayK schrieb:
>
>> Since SQLite is perfect for use in single-user desktop utility
>> applications
>> and since such applications typically store hierarchial data (tree) in a
>> single table, it would be nice to have support for special features like
>> connect by of oracle.
>> See:   http://www.adp-gmbh.ch/ora/sql/connect_by.html
>> --
>> View this message in context:
>> http://www.nabble.com/Managing-trees-in-the-database-t1135555.html#a2974277 
>>
>> Sent from the SQLite forum at Nabble.com.
>>
>>  
>
> Depending on your goals, this concept might give you another solution:
>
> http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
>
> As the code examples are for mysql, you need to adapt it a bit.
>
> HTH
>
> Philipp
>
The following is an implementation of the materialized path method of
storing trees in an SQL database. I have used this method in the past
with fairly large trees (about 30K nodes and around 10 levels). I find
it works quite well and it's a lot simpler to understand than the nested
set approach. For the types of queries I'm doing it usually results in
simpler queries as well.

It adds one variable length text field to each record to store the path.
It uses triggers to automatically maintain this path and implement
cascaded deletes.

It might be suitable for your needs as well.


    -- Sample materialized path implementation of a tree in SQL
    --
    -- The materialized path is a string containing the path to the node
    -- represented by the node ids along the path separated by / characters.
    -- This string is easy to search using the LIKE comparison function
    -- to check for path characteristics. All the methods that apply to the
    -- adjacency list representation can also be used with this format.
    -- The path string is maintained automatically by triggers. Any node
    -- inserted with a NULL parent_id is the root of a new tree.

    create table tree (
      id        integer primary key,
      parent_id integer references tree,
      data      text,
      path      text        -- materialized path
    );
     
    -- set path to node when it is inserted
    create trigger tree_in after insert on tree
    begin
        update tree set path =
            case when parent_id isnull then '/'
            else (select path from tree where id = new.parent_id) ||
parent_id || '/'
            end
        where id = new.id;
    end;

    -- delete subtree below node when a node is deleted
    create trigger tree_del before delete on tree
    begin
        delete from tree where id in
            (select id from tree
            where path like old.path || old.id || '/%');
    end;

    -- example
    insert into tree (id, parent_id, data) values (1, NULL, 'parent');
    insert into tree (id, parent_id, data) values (NULL, 1, 'son');
    insert into tree (id, parent_id, data) values (NULL, 1, 'daughter');
    insert into tree (id, parent_id, data) values (NULL, 2, 'grandchild');
    select * from tree;

    -- find all root nodes
    select * from tree where parent_id isnull;

    -- find all leaf nodes
    select * from tree where id not in (select parent_id from tree);

    -- find all nodes in the sub tree below node
    select * from tree
    where path like (select path || id || '/%' from tree where id =
:node_id);

    -- find all nodes along the path to node
    select * from tree
    where (select path || id || '/' from tree where id = :node_id)
        like path || id || '/%'
    order by path;

    -- find all nodes on level 3 or 4
    select * from tree
    where path like '/%/%/' or path like '/%/%/%/';

HTH
Dennis Cote



Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

SanjayK
In reply to this post by Philipp Knüsel
Fascinating! I got another reply too by email pointing to the book by Joe Celko.

Thanks,
Sanjay
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

SanjayK
In reply to this post by Dennis Cote
This path is a nice idea to keep it simple for what I want to do. But I have a question:

Suppose the database is like this:

id, parentid, data
1, 0, parent
2, 1, son B
3, 1, son A
4, 1, daughter A
5, 1, daughter B
6, 2, grandchild B
7, 2, grandchild A

Using the path approach, how can I sort this tree such that the structure remains same (children immediately after parent) but the children are sorted in their own level. So the result should be:

id, parentid, data
1, 0, parent
4, 1, daughter A
5, 1, daughter B
3, 1, son A
2, 1, son B
7, 2, grandchild A
6, 2, grandchild B

If I sort using the path then the structure is disturbed because the top level children are then clubbed together (same path) and their own children are separated by several rows downwards.

The reason I want to preserve the structure is that I am filling a tree control and preserving the structure makes it simpler to code in a recursive manner.

Thanks,
Sanjay
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

SanjayK
I think I gave a wrong example. This simple tree will probably sort ok with "order by path, data." But if I introduce more children with subchildren, say, at level 1, the problem of clubbing will be apparent.
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Andrew Piskorski
In reply to this post by SanjayK
On Thu, Feb 16, 2006 at 09:22:13PM -0800, SanjayK wrote:
>
> Fascinating! I got another reply too by email pointing to the book by Joe
> Celko.

Also try searching the SQLite archives, e.g.:

  http://www.mail-archive.com/sqlite-users@.../msg05235.html

There are probably many other relevent posts as well.  (Unfortunately,
the search functions on the existing mailing list archives seem to be
rather poor, you can search for "Hierarchical", "connect", or "by",
but not for the phrase "connect by".)

I'm not really familiar with that "Tree Sortkey" technique, but here
are a few more links about it, (which in turn include some links to
other techniques as well):

  http://openacs.org/doc/current/tutorial-hierarchical.html
  http://rubick.com/openacs/tree_sortkey
  http://openacs.org/forums/message-view?message_id=143978

I think another name for the basic "Tree Sortkey" technique is the
"m-vgID method" as described by Miguel Sofer:

  http://openacs.org/forums/message-view?message_id=18365
  http://openacs.org/forums/message-view?message_id=104242
  http://www.utdt.edu/~mig/sql-trees/

Reportedly (according to Dan Wickstrom in 2001, who I'm pretty sure
knew what he was talking about):

  "This method provides the flexiblity of the nested set model, and it
  doesn't suffer from the same performance problems on insert/update
  operations. In addition, the implementation is simpler."

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Dennis Cote
In reply to this post by SanjayK
SanjayK wrote:

>This path is a nice idea to keep it simple for what I want to do. But I have
>a question:
>
>Suppose the database is like this:
>
>id, parentid, data
>1, 0, parent
>2, 1, son B
>3, 1, son A
>4, 1, daughter A
>5, 1, daughter B
>6, 2, grandchild B
>7, 2, grandchild A
>
>Using the path approach, how can I sort this tree such that the structure
>remains same (children immediately after parent) but the children are sorted
>in their own level. So the result should be:
>
>id, parentid, data
>1, 0, parent
>4, 1, daughter A
>5, 1, daughter B
>3, 1, son A
>2, 1, son B
>7, 2, grandchild A
>6, 2, grandchild B
>
>If I sort using the path then the structure is disturbed because the top
>level children are then clubbed together (same path) and their own children
>are separated by several rows downwards.
>
>The reason I want to preserve the structure is that I am filling a tree
>control and preserving the structure makes it simpler to code in a recursive
>manner.
>
>  
>
Sanjay,

I'm not sure I follow your question. You cant display an arbitrary tree
so that all children are immediately after their parent. The closest you
could come would be a traversal something like this.

P1
    C1
       GC1
    C2
       GC2
       GC3
    C3
P2
    C4
    C5
       GC4
    C6
P3

But in this case, the children of P1 do not follow immediately after P1,
the grandchildren get in the way.

If you only want one level, i.e. the children of a node, don't use the
path. Simply select all the nodes with the parent_id equal to the id of
the node in question.

This is what I do when filling a tree control. I only retrieve the
children when the user expands a node in the tree control. This makes
displaying the tree control very fast since you only need to get the
root nodes first. Then you add nodes to the tree control as the user
expands nodes.

To get the display above the user would initially see the three root nodes

P1
P2
P3

After they expand P1 they would see

P1
    C1
    C2
    C3
P2
P3

If they continued to expand all nodes they would get your ordered
display. You will have done six simple queries, one for each expand
click the user has made.

Actually, there is one small complicating factor, and that is that you
must also determine if you should show an expand control for each node
in your tree control. You only show the expand control if the node has
children. So simply check if each node appears in the parent_id column
of the tree table.

select exists (select id where parent_id = :node_id);

This query will work, but it requires a full table scan to check for the
existence of children. The speed of this query can be greatly improved
by adding an index on the parent_id column.

create index tree_parent on tree(parent_id);

Now this query goes directly to the first child node if present. It
still finds all the children, which may be a concern for a tree with a
high fanout. This can be eliminated by changing the query to only check
for the existence of one child.

select exists (select id where parent_id = :node_id limit 1);

If this query returns true, you add an expand control to the item in the
tree control.

I have also used this path method with large static trees, where I
create a temporary table that stores the count of children for each
node. Then I join this table with the tree to check if I should add the
expand control to a tree item.

HTH
Dennis Cote


Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

SanjayK
Dennis,

I understand what you are saying. My problem is specific to only the sorting. Given the tree:

P1
    C1
       GC1
    C2
       GC2
       GC3
    C3
P2
    C4
    C5
       GC4
    C6
P3

This structure is perfect for my use. In fact, this is the way I want it. The children at any level should immediately follow their parent in the table.

Now suppose, each of these items had a field "Size" and I want to sort this table such that the structure remains same but the siblings are sorted by size. For example, the result of a sort might be:

P1
    C2
       GC3
       GC2
    C3
    C1
       GC1
...

Here, the position of the siblings changed according to sort but the structure didn't break. I want to know the SQL for achieving this sort using the path and the size. For example, if I were to use the path as you described and do an ORDER BY path, size, it will do this but the structure of the table will break. Because the paths of the siblings would be same and hence they will become clubbed together and won't be separated by their children as in the actual structure.

If it is still unclear, I will make and post a real sample.

Thanks,
Sanjay
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Jim C. Nasby
In reply to this post by Philipp Knüsel
On Thu, Feb 16, 2006 at 08:23:43PM +0100, Philipp Kn?sel wrote:

> SanjayK schrieb:
> >Since SQLite is perfect for use in single-user desktop utility applications
> >and since such applications typically store hierarchial data (tree) in a
> >single table, it would be nice to have support for special features like
> >connect by of oracle.
> >
> >See:
> >  http://www.adp-gmbh.ch/ora/sql/connect_by.html
> >--
> >View this message in context:
> >http://www.nabble.com/Managing-trees-in-the-database-t1135555.html#a2974277
> >Sent from the SQLite forum at Nabble.com.
> >
> >  
> Depending on your goals, this concept might give you another solution:
>
> http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

Yet again, examples of mis-information from MySQL...

"The first common task when dealing with hierarchical data is the
display of the entire tree, usually with some form of indentation. The
most common way of doing this is in pure SQL is through the use of a
self-join:"

Self-join maybe, but certainly not in the way they're suggesting.
Databases that support WITH make this easy (and it's technically a
self-join). The problem with what he's proposing is that it silently
limits you to (in this case) 4 levels of nesting. If you have more
levels than that you'll end up missing data from either the top or
bottom of the tree. Of course, if you think "Feb 31" is a valid date,
maybe that's OK...

Their information about using a nested set model seems accurate, though.

Another option is to use 'ltree'. There used to be an implimentation for
PostgreSQL, but it looks like it's been removed.
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/ltree/README.ltree?rev=1.7
has more info.
--
Jim C. Nasby, Sr. Engineering Consultant      [hidden email]
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Jim Dodgen
Jim C. Nasby wrote:

> On Thu, Feb 16, 2006 at 08:23:43PM +0100, Philipp Kn?sel wrote:
>  
>> SanjayK schrieb:
>>    
>>> Since SQLite is perfect for use in single-user desktop utility applications
>>> and since such applications typically store hierarchial data (tree) in a
>>> single table, it would be nice to have support for special features like
>>> connect by of oracle.
>>>
>>> See:
>>>  http://www.adp-gmbh.ch/ora/sql/connect_by.html
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Managing-trees-in-the-database-t1135555.html#a2974277
>>> Sent from the SQLite forum at Nabble.com.
>>>
>>>  
>>>      
>> Depending on your goals, this concept might give you another solution:
>>
>> http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
>>    
>
> Yet again, examples of mis-information from MySQL...
>
> "The first common task when dealing with hierarchical data is the
> display of the entire tree, usually with some form of indentation. The
> most common way of doing this is in pure SQL is through the use of a
> self-join:"
>
> Self-join maybe, but certainly not in the way they're suggesting.
> Databases that support WITH make this easy (and it's technically a
> self-join). The problem with what he's proposing is that it silently
> limits you to (in this case) 4 levels of nesting. If you have more
> levels than that you'll end up missing data from either the top or
> bottom of the tree. Of course, if you think "Feb 31" is a valid date,
> maybe that's OK...
>
> Their information about using a nested set model seems accurate, though.
>
> Another option is to use 'ltree'. There used to be an implimentation for
> PostgreSQL, but it looks like it's been removed.
> http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/ltree/README.ltree?rev=1.7
> has more info.
>  
I really miss the oracle "connect by" operator. I first used it for a
postal application back in 1992. I am surprised that this feature has
not made it into to the standard or any other RDBMS.  Maybe  we  should
see about adding the extension.
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Dennis Cote
In reply to this post by SanjayK
On 2/17/06, SanjayK <[hidden email]> wrote:

>
>
> Dennis,
>
> I understand what you are saying. My problem is specific to only the
> sorting. Given the tree:
>
> P1
>     C1
>        GC1
>     C2
>        GC2
>        GC3
>     C3
> P2
>     C4
>     C5
>        GC4
>     C6
> P3
>
> This structure is perfect for my use. In fact, this is the way I want it.
> The children at any level should immediately follow their parent in the
> table.
>
> Now suppose, each of these items had a field "Size" and I want to sort
> this
> table such that the structure remains same but the siblings are sorted by
> size. For example, the result of a sort might be:
>
> P1
>     C2
>        GC3
>        GC2
>     C3
>     C1
>        GC1
> ...
>
> Here, the position of the siblings changed according to sort but the
> structure didn't break. I want to know the SQL for achieving this sort
> using
> the path and the size. For example, if I were to use the path as you
> described and do an ORDER BY path, size, it will do this but the structure
> of the table will break. Because the paths of the siblings would be same
> and
> hence they will become clubbed together and won't be separated by their
> children as in the actual structure.
>
> If it is still unclear, I will make and post a real sample.
>
>
> Sanjay,

You can get close to the depth first order of the first list by ordering by
the full path to the nodes.

select * from tree order by path || id || '/';

This is correct for this sample

insert into tree (id, parent_id, data) values (NULL, NULL, 'P1');
insert into tree (id, parent_id, data) values (NULL, NULL, 'P2');
insert into tree (id, parent_id, data) values (NULL, NULL, 'P3');
insert into tree (id, parent_id, data) values (NULL, 1, 'C1');
insert into tree (id, parent_id, data) values (NULL, 1, 'C2');
insert into tree (id, parent_id, data) values (NULL, 1, 'C3');
insert into tree (id, parent_id, data) values (NULL, 2, 'C4');
insert into tree (id, parent_id, data) values (NULL, 2, 'C5');
insert into tree (id, parent_id, data) values (NULL, 2, 'C6');
insert into tree (id, parent_id, data) values (NULL, 4, 'GC1');
insert into tree (id, parent_id, data) values (NULL, 5, 'GC2');
insert into tree (id, parent_id, data) values (NULL, 5, 'GC3');
insert into tree (id, parent_id, data) values (NULL, 8, 'GC1');

Which displays as

sqlite> select * from tree order by (path || id || '/');
1                       P1          /
4           1           C1          /1/
10          4           GC1         /1/4/
5           1           C2          /1/
11          5           GC2         /1/5/
12          5           GC3         /1/5/
6           1           C3          /1/
2                       P2          /
7           2           C4          /2/
8           2           C5          /2/
13          8           GC1         /2/8/
9           2           C6          /2/
3                       P3          /

However, it will not always be correct because it uses a lexigraphic
ordering which isn't quite right.

If we change the query slightly to display the full path it is sorting on,
we see.

sqlite> select *, path || id || '/' as full from tree order by full;
1                       P1          /           /1/
4           1           C1          /1/         /1/4/
10          4           GC1         /1/4/       /1/4/10/
5           1           C2          /1/         /1/5/
11          5           GC2         /1/5/       /1/5/11/
12          5           GC3         /1/5/       /1/5/12/
6           1           C3          /1/         /1/6/
2                       P2          /           /2/
7           2           C4          /2/         /2/7/
8           2           C5          /2/         /2/8/
13          8           GC1         /2/8/       /2/8/13/
9           2           C6          /2/         /2/9/
3                       P3          /           /3/

Now if we arbitrarily change the id of C3 from 6 to 14 we will break the
lexigraphic ordering.

sqlite> update tree set id = 14 where id = 6;
sqlite> select *, path || id || '/' as full from tree order by full;
1                       P1          /           /1/
14          1           C3          /1/         /1/14/
4           1           C1          /1/         /1/4/
10          4           GC1         /1/4/       /1/4/10/
5           1           C2          /1/         /1/5/
11          5           GC2         /1/5/       /1/5/11/
12          5           GC3         /1/5/       /1/5/12/
2                       P2          /           /2/
7           2           C4          /2/         /2/7/
8           2           C5          /2/         /2/8/
13          8           GC1         /2/8/       /2/8/13/
9           2           C6          /2/         /2/9/
3                       P3          /           /3/

This is because the string /1/1 (the first part of /1/14) sorts before the
string /1/4. To solve this we need to create a custom collation function for
the path that understands that the path ids can't be broken up. This should
be possible using the sqlite3_create_collation API to create a tree_path
collation, and then ordering using that collation.

select * from tree order by path || id || '/' collate tree_path;

I am going to write such a collation function to go along with the
materialized path tree tables.

As for sorting by size, that will take some more thought. The full path
above is a complete ordering so it won' t do any good to also order by size.
I think it will be necessary to create a size_path field that is much like
the id path used for the tree. It would track the hierarchy of sizes. I
think this can be done in an auxiliary table joined to the tree.

I will let you know. I won' t get a chance to look at this for a few days as
I will be away from home (i.e. no PC).

HTH
Dennis Cote
Reply | Threaded
Open this post in threaded view
|

Re: Managing trees in the database

Andrew Piskorski
In reply to this post by Jim Dodgen
On Sat, Feb 18, 2006 at 05:34:45PM -0800, Jim Dodgen wrote:

> I really miss the oracle "connect by" operator. I first used it for a
> postal application back in 1992. I am surprised that this feature has
> not made it into to the standard or any other RDBMS.  Maybe  we  should

Although useful, Oracle's connect by feature is widely considered to
be a flawed design, which is part of the reason few other databases
support it.  The SQL standard specifies a different approach, which is
said to be similar to DB2's "recursive SQL".

Hm, here are a bunch of links on "Hierarchical data in RDBMSs", which
I just stumbled across:

  http://troels.arvin.dk/db/rdbms/links/#hierarchical

--
Andrew Piskorski <[hidden email]>
http://www.piskorski.com/