[Phylobase-devl] labeling order

Thibaut Jombart jombart at biomserv.univ-lyon1.fr
Sun Dec 28 13:53:20 CET 2008

```Hello again,

there are plenty of things in the past emails, so I might be missing a few.
>
>>>   Hmmm.
>>>
>>>   For the record, here's Steve's statement:
>>>
>>> SWK - This is crucial and we should decide soon, needs to be sorted
>>> out for 1.5. I think that many of the problems we're having with
>>> labels and reordering are due to the fact that until now we treated
>>> nodes and edges as interchangable. i.e. we had node labels in edge
>>> matrix order, but these labels should really be associated with
>>> nodes, not with edges.
>>>
>> I could not agree more.
>>
>>> This assumption caused things to break once edges
>>>  and nodes were not equivalent (now that root edge is in the edge matrix
>>> and we allow edge matrix reordering, or for unrooted trees). I
>>> think we need to be very clear about whether methods are actually
>>> operating on nodes or edges.
>>> I suggest that edge, edge.labels and edge.lengths (branch lengths)
>>> are in 'edge' order.
>>>
>> I can hardly see how it would make sense otherwise. All information
>> provided for a given item should be sorted according to this item. Tips
>> labels should be in the tip order, node (internal nodes) label sorted as
>> node numbers, etc.
>>
>
>   Here's where it gets tricky.  Of course it's sensible for edge
> lengths and labels to be in edge matrix order ... for the others
> (tip labels, node labels), what do you mean by "tip order", "node numbers"?
>
Sorry, I have no working R from here, so I can provide no clear example.
Say a tree has T tips and N internal nodes.
Tip labels should be provided for nodes 1:N, and so-called node labels
(internal nodes) for (T+1):(T+N). That is, the ordering tagged as"Node
number ordering of labels" from Peter's (useful, thanks !) example. As
Peter has shown, only this ordering still holds when changing the
ordering of edges.
>>> Everything else (node labels, tip labels) should
>>> be in node id order. nodeId can translate between these two orders.
>>> Reorder can act on the edge* only since the underlying node ids
>>> will not change.
>>>
>>> Francois: It's definitely a crucial issue. Perhaps we could track
>>> node.labels and tip.labels by using named vectors, the names of the
>>> vector would be the nodeId.
>>>
>>>
>> I may be missing smthg here, but isn't this we do when using getnodes?
>>
>
>   I think we don't need more identifiers than node numbers ...
>
I don't get this answer. What I meant was: it would be clearer if we
used named vectors for node/tip labels. Possibly even for edges. Taking
back Peter's example:

@edge
[,1] [,2]
[1,]    5    6
[2,]    6    1
[3,]    6    2
[4,]    5    7
[5,]    7    3
[6,]    7    4
[7,]   NA    5

-> Use named vectors

@tip.label
1     2    3    4
"t1" "t2" "t3" "t4"

@node.label
5     6    7
"root" "n1" "n2"

So we make it clear what ordering is used. In the doc, we can then just say that names of labels vectors for internal nodes and tips are numbers indentifying these items in @edge.

> Marguerite:
> This one is very important, and I think it's a very bad idea to unlink
> the edges and nodes. Edges and nodes are intimately linked. In my
> mind, the edge is simply the branch below the node. So to have edges
> in one order and nodes in another order makes no sense to me at all.
> Why don't we simply give node ID's in "edge" order as you are using
> it? otherwise, there is HUGE potential for confusion. And we would
> need yet another index that indicates a mapping of the node ID to the
> edge matrix.
>
>
>> Again, I completely agree. Edges are uniquely identified by their
>> desending node, and this is what we have used from the begining.
>> Moreover, this is what is used in ape, and I think we should diverge
>> from it only when it is mandatory (e.g. plotting trees with singleton if
>> these make sense). Most phylobase users are and will be primarly ape users.
>>
>
>   We're not diverging from this.
>   We're saying that we will keep data and the lists of node labels
> (tips and internal nodes) in order of node numbers, and not rearrange
> them every time we reorder the edge matrix.
>
Yes, so no disagreement for me here.
>>> Instead, why don't we just decide on a standard ordering for phylobase
>>> number the node ID's in this way, and then allow the edge matrix and
>>> nodeID (and all data vectors) to be reordered as needed for whatever
>>> functions.  Using the node ID, we can easily  put everything back to
>>> the "default" phylobase order, BUT ONLY IF all objects (edge matrix,
>>> branch lengths, labels, etc etc are in the SAME order. Don't "break"
>>> the integrity of the object just for programming convenience. There is
>>> just too much danger for confusion. I, for one, would stop using
>>> phylobase, because it's just too hard to remember the peculiarities of
>>> the way the object is constructed. Everytime I wanted to do something,
>>> I'd have to relearn the rules.
>>>
>>>
>> Same for me.
>>
>>
>
>   Hmm.
>   I've been working to try to make everything consistent in node order
> (as Steve suggested).  Thibaut/Marguerite, what do you suggest for the
> case of unrooted trees?
Order by node numbers, as in Peter's example.
>   Thibaut, how often do you match up edges with
> data and labels?
>
Never. Not sure I will ever need to do so. All matching I use are
data/labels with tips and internal nodes.
>    I've done a bunch of stuff, and I'd like to commit it, because it's
> all reasonably consistent now, but I'd like to hear some more
> conversation -- I'm willing to work back through while it's fresh in
> my mind and do everything the opposite way (keeping everything
> in edge-matrix order all the time), provided we know how to handle
> unrooted trees (and are willing to live with not being able to handle
> reticulations).
>
>
>  Ben
Peter wrote:
> Following up on the previous point, maybe what we really need is to
> spell out how we want tree structures to look, similar to the
> whitepaper on the phylo class.
>
> I understand the desire to not break existing code and provide a
> phylogeny class that is intuitive for users and developers, but I don't
> agree that we should feel bound to follow the ape phylo structure. If
> we're just implementing phylo in S4 then we should be upfront about it
> and follow the phylo class specification exactly. I don't think we're
> doing that, though. There are a number of features that we might want
> to implement that aren't in phylo, including singleton nodes,
> reordering of the edges or nodes, root edges in the edge matrix,
> reticulations, how to represent rooted versus unrooted trees, separate
> labels and data for edges and nodes, and so on.
>
I think I had a different idea of what phylobase was about, but no
problem there. To me, the first purpose of phylobase was handling
phylogeny+data associated to tips and possibly internal nodes. That is,
leave what concerns phylogeny alone to ape, as it already set the basis
for handling phylogeny, and did that pretty well. Of course we can think
of improvements, but to me they might belong to ape more than to
phylobase (that was my point with all 'treewalk' functions, that would
be useful for ape's phylogenies as well). For instance, I am pretty sure
that if we provide Emmanuel a good example of a tree where handling
singletons is needed, and possibly a patch to the code, that would be
implemented quickly in ape. I understand that diverging from ape is
quicker and more straightforward than making ape and phylobase evolve
together. My position would be mimimizing such changes and making sure
tree conversion remains possible both ways.

Best,

Thibaut.
```