[Phylobase-devl] labeling order

Sat Dec 27 20:21:28 CET 2008

On Dec 27, 2008, at 1:47 PM, Steven Kembel wrote:

[snip]

> This is my question as well - especially for unrooted trees where
> there is not an edge for every node and vice versa, so tips and
> nodes and edges can't be in the same order. I think that as we are
> suggesting ways to modify the tree structure, we can provide examples
> of what the edge matrix, node labels, etc. would actually look like
> for an unrooted and rooted tree, and how tree reordering would work, I
> think this might help clarify things.

Okay, I've attempted to do that as best I can below.  It's not  
complete so others should feel free to update/correct my assumptions.   
I haven't spent as much time thinking about trees as the others on  
this list and much of the tree programming I have done involved nodes  
with pointers to other nodes, and there was no explicit edge matrix.   
Hopefully this helps

@edge
      [,1] [,2]
[1,]    5    6
[2,]    6    1
[3,]    6    2
[4,]    5    7
[5,]    7    3
[6,]    7    4
[7,]   NA    5

======== Node number ordering of labels========
This is the ape method

@tip.label
[1] "t1" "t2" "t3" "t4"

@node.label
[1] "root" "n1" "n2"

These are in node numbering order e.g. node 1 - t1, node 2 - t3... etc

======== Edge order ordering of labels ========

@tip.label
[1] "t1" "t2" "t3" "t4"

@node.label
[1] "n1" "n2" "root"

======== Edge order ordering of single label vector ========

@label
[1] "n1" "t1" "t2" "n2" "t3" "t4"  "root"

If there are not internal node labels, those slots are either "" or NA.

-----------------------------------------------------------------------

If we reorder the edge matrix to:

@edge
      [,1] [,2]
[1,]   NA    5
[2,]    7    3
[3,]    7    4
[4,]    5    6
[5,]    6    1
[6,]    6    2
[7,]    5    7

======== Node number ordering of labels========
@tip.label
[1] "t1" "t2" "t3" "t4"

@node.label
[1] "root" "n1" "n2"

The order stays the same because the node numbers have not changed.

======== Edge order ordering of labels ========

@tip.label
[1] "t3" "t4" "t1" "t2"

@node.label
[1]  "root" "n1" "n2"

======== Edge order ordering of single label vector ========

@label
[1] "root" "t3" "t4" "n1" "t1" "t2" "n2"

-----------------------------------------------------------------------

If we unroot the tree

@edge
      [,1] [,2]
[1,]    5    1
[2,]    5    2
[3,]    5    6
[4,]    6    3
[5,]    6    4

======== Node number ordering of labels========
@tip.label
[1] "t1" "t2" "t3" "t4"

@node.label
[1] "n1" "n2"

The order stays the same because the node numbers have not changed.   
But the root order disappears obviously

======== Edge order ordering of labels ========
@tip.label
[1] "t1" "t2" "t3" "t4"

@node.label
[1] ????  -- I don't know how this would work

======== Edge order ordering of single label vector ========

@label
[1] "t1" "t2" "n2" "t3" "t4"

Unclear how node number 5, or node 'n1' is labelled.

[snip]

> Following up on the previous point, maybe what we really need is
> to spell out how we want tree structures to look, similar to the
> whitepaper on the phylo class.

I think this is all be required, if only for ensuring that all of us  
are on the same page.

> I understand the desire to not break existing code and provide a
> phylogeny class that is intuitive for users and developers, but
> I don't agree that we should feel bound to follow the ape phylo
> structure. If we're just implementing phylo in S4 then we should be
> upfront about it and follow the phylo class specification exactly. I
> don't think we're doing that, though. There are a number of features
> that we might want to implement that aren't in phylo, including
> singleton nodes, reordering of the edges or nodes, root edges in the
> edge matrix, reticulations, how to represent rooted versus unrooted
> trees, separate labels and data for edges and nodes, and so on.
>
>
>>>> Instead, why don't we just decide on a standard ordering for
>>>> phylobase number the node ID's in this way, and then allow the
>>>> edge matrix and nodeID (and all data vectors) to be reordered as
>>>> needed for whatever functions.  Using the node ID, we can easily
>>>> put everything back to the "default" phylobase order, BUT ONLY
>>>> IF all objects (edge matrix, branch lengths, labels, etc etc are
>>>> in the SAME order. Don't "break" the integrity of the object
>>>> just for programming convenience.  There is just too much danger
>>>> for confusion. I, for one, would stop using phylobase, because
>>>> it's just too hard to remember the peculiarities of the way the
>>>> object is constructed. Everytime I wanted to do something, I'd
>>>> have to relearn the rules.
>>
>>
>> Hmm.  I've been working to try to make everything consistent in node
>> order (as Steve suggested).  Thibaut/Marguerite, what do you suggest
>> for the case of unrooted trees?  Thibaut, how often do you match up
>> edges with data and labels?  I've done a bunch of stuff, and I'd
>> like to commit it, because it's all reasonably consistent now, but
>> I'd like to hear some more conversation -- I'm willing to work back
>> through while it's fresh in my mind and do everything the opposite
>> way (keeping everything in edge-matrix order all the time), provided
>> we know how to handle unrooted trees (and are willing to live with
>> not being able to handle reticulations).
>
> I agree with Ben. I don't mind undoing what we did, but only if
> there's a clear plan of exactly how we want things to work, spelled
> out in detail, having thought about how it will work for unrooted
> and rooted trees. I'm not arguing that the changes that were made to
> edge and node numbering are the only way to go, or even the best, but
> they are what we came up with to deal with the fact that when you add
> the root edge to the edge matrix, unrooted trees broke most of the
> existing code because they have an edge that is shared between two
> nodes. I understand now why the ape developers kept the root edge out
> of the edge matrix, it makes it complicated to deal with rooted and
> unrooted trees using the same methods.
>
> From Ben's example code...
>
>> unroot(tree.owls)$edge
>>      [,1] [,2]
>> [1,]    5    6
>> [2,]    6    1
>> [3,]    6    2
>> [4,]    5    3
>> [5,]    5    4
>>
>> node 5 does not appear in the second (descendant) column of the edge
>> matrix, so the node information has to be somewhat distinct from
>> the edge information -- it's one unit longer. ape dealt with this
>> by having root information (if any) hanging out in a separate place
>> within the data structure, but we got rid of that ...
>
> Say I want labels for nodes 5 and 6. Where do those labels go? i.e.
> what does labels() look like for this edge matrix, and how do we
> reorder this tree for plotting or traversal? What about after we root
> this tree at node 5?

Reordering and plotting, operate directly on the edge matrix, and  
should be okay regardless of the labeling and ordering scheme.   
Plotting currently doesn't work on unrooted trees at all, but I do  
have plans to implement methods.

> 0) Undo. Revert to where we were a week ago - take the root edge
> out of the edge matrix.  1) Do what we just did - put the root edge
> is in the edge matrix, nodes have their own set of attributes, so
> do edges, and we write accessors to translate between node id's and
> edges (transparent to end user for rooted trees?).  2) Keep root in
> the edge matrix but split rooted and unrooted trees into separate
> classes each with their own methods. I think this would be more
> confusing for programmers and users, but we could basically follow
> the ape tree structure for rooted trees and have a slightly modified
> structure for unrooted trees.  3) Arbitrarily root unrooted trees at
> one of the nodes that share an edge and strip out the imaginary root
> edge for printing and plotting methods. This was an idea that Brian
> suggested. I'm not sure how hard this would be to implement.

I have only a slight preference for option 1 or 2.

Peter