[Phylobase-devl] labeling order

Sat Dec 27 15:30:29 CET 2008

Hi all,

and sorry for not participating to the hackathon. I really had no time
at that moment -- moving out from my former lab, finishing a new
'adephylo' package, which I will introduce soon or later on the general
ML, and doing many other stuffs. So, first, thanks to all for making
things move forward.

Here are some opinions, in case it is still time to express some (after
the battle). I recognize most of them consist in encouraging not to
change data formats as much as possible -- basically because I have now
a working package based on our current data representation. Also, from
what I and some of my colleagues working with phylobase have experienced
so far, it works pretty well and in a sensible way.
>   Hmmm.
>
>   For the record, here's Steve's statement:
>
> SWK - This is crucial and we should decide soon, needs to be sorted
> out for 1.5. I think that many of the problems we're having with
> labels and reordering are due to the fact that until now we treated
> nodes and edges as interchangable. i.e. we had node labels in edge
> matrix order, but these labels should really be associated with
> nodes, not with edges. 
I could not agree more.
> This assumption caused things to break once edges
>  and nodes were not equivalent (now that root edge is in the edge matrix
> and we allow edge matrix reordering, or for unrooted trees). I
> think we need to be very clear about whether methods are actually
> operating on nodes or edges.
> I suggest that edge, edge.labels and edge.lengths (branch lengths)
> are in 'edge' order. 
I can hardly see how it would make sense otherwise. All information
provided for a given item should be sorted according to this item. Tips
labels should be in the tip order, node (internal nodes) label sorted as
node numbers, etc.
> Everything else (node labels, tip labels) should
> be in node id order. nodeId can translate between these two orders.
> Reorder can act on the edge* only since the underlying node ids
> will not change.
>
> Francois: It's definitely a crucial issue. Perhaps we could track
> node.labels and tip.labels by using named vectors, the names of the
> vector would be the nodeId.
>   
I may be missing smthg here, but isn't this we do when using getnodes?
> Marguerite:
> This one is very important, and I think it's a very bad idea to unlink
> the edges and nodes. Edges and nodes are intimately linked. In my
> mind, the edge is simply the branch below the node. So to have edges
> in one order and nodes in another order makes no sense to me at all.
> Why don't we simply give node ID's in "edge" order as you are using
> it? otherwise, there is HUGE potential for confusion. And we would
> need yet another index that indicates a mapping of the node ID to the
> edge matrix.
>   
Again, I completely agree. Edges are uniquely identified by their
desending node, and this is what we have used from the begining.
Moreover, this is what is used in ape, and I think we should diverge
from it only when it is mandatory (e.g. plotting trees with singleton if
these make sense). Most phylobase users are and will be primarly ape users.
> Instead, why don't we just decide on a standard ordering for phylobase
> number the node ID's in this way, and then allow the edge matrix and
> nodeID (and all data vectors) to be reordered as needed for whatever
> functions.  Using the node ID, we can easily  put everything back to
> the "default" phylobase order, BUT ONLY IF all objects (edge matrix,
> branch lengths, labels, etc etc are in the SAME order. Don't "break"
> the integrity of the object just for programming convenience. There is
> just too much danger for confusion. I, for one, would stop using
> phylobase, because it's just too hard to remember the peculiarities of
> the way the object is constructed. Everytime I wanted to do something,
> I'd have to relearn the rules.
>   
Same for me.

Best regards,

Thibaut.