[Phylobase-devl] labeling order
steve.kembel at gmail.com
Sat Dec 27 19:47:47 CET 2008
>> Here are some opinions, in case it is still time to express some
>> the battle). I recognize most of them consist in encouraging not to
>> change data formats as much as possible -- basically because I have
>> a working package based on our current data representation. Also,
>> what I and some of my colleagues working with phylobase have
>> so far, it works pretty well and in a sensible way.
> I think the attempt is to make things consistent, which they weren't
> entirely before. I agree that changing things as little as possible
> a good idea!
>>> For the record, here's Steve's statement:
>>> SWK - This is crucial and we should decide soon, needs to be sorted
>>> out for 1.5. I think that many of the problems we're having with
>>> labels and reordering are due to the fact that until now we treated
>>> nodes and edges as interchangable. i.e. we had node labels in edge
>>> matrix order, but these labels should really be associated with
>>> nodes, not with edges.
>> I could not agree more.
>>> This assumption caused things to break once edges
>>> and nodes were not equivalent (now that root edge is in the edge
>>> and we allow edge matrix reordering, or for unrooted trees). I
>>> think we need to be very clear about whether methods are actually
>>> operating on nodes or edges.
>>> I suggest that edge, edge.labels and edge.lengths (branch lengths)
>>> are in 'edge' order.
>> I can hardly see how it would make sense otherwise. All information
>> provided for a given item should be sorted according to this item.
>> labels should be in the tip order, node (internal nodes) label
>> sorted as
>> node numbers, etc.
> Here's where it gets tricky. Of course it's sensible for edge
> lengths and labels to be in edge matrix order ... for the others
> (tip labels, node labels), what do you mean by "tip order", "node
This is my question as well - especially for unrooted trees where
there is not an edge for every node and vice versa, so tips and nodes
and edges can't be in the same order. I think that as we are
suggesting ways to modify the tree structure, we can provide examples
of what the edge matrix, node labels, etc. would actually look like
for an unrooted and rooted tree, and how tree reordering would work, I
think this might help clarify things.
>>> This one is very important, and I think it's a very bad idea to
>>> the edges and nodes. Edges and nodes are intimately linked. In my
>>> mind, the edge is simply the branch below the node. So to have edges
>>> in one order and nodes in another order makes no sense to me at all.
>>> Why don't we simply give node ID's in "edge" order as you are using
>>> it? otherwise, there is HUGE potential for confusion. And we would
>>> need yet another index that indicates a mapping of the node ID to
>>> edge matrix.
>> Again, I completely agree. Edges are uniquely identified by their
>> desending node, and this is what we have used from the begining.
>> Moreover, this is what is used in ape, and I think we should diverge
>> from it only when it is mandatory (e.g. plotting trees with
>> singleton if
>> these make sense). Most phylobase users are and will be primarly
>> ape users.
> We're not diverging from this.
> We're saying that we will keep data and the lists of node labels
> (tips and internal nodes) in order of node numbers, and not rearrange
> them every time we reorder the edge matrix.
Following up on the previous point, maybe what we really need is to
spell out how we want tree structures to look, similar to the
whitepaper on the phylo class.
I understand the desire to not break existing code and provide a
phylogeny class that is intuitive for users and developers, but I
don't agree that we should feel bound to follow the ape phylo
structure. If we're just implementing phylo in S4 then we should be
upfront about it and follow the phylo class specification exactly. I
don't think we're doing that, though. There are a number of features
that we might want to implement that aren't in phylo, including
singleton nodes, reordering of the edges or nodes, root edges in the
edge matrix, reticulations, how to represent rooted versus unrooted
trees, separate labels and data for edges and nodes, and so on.
>>> Instead, why don't we just decide on a standard ordering for
>>> number the node ID's in this way, and then allow the edge matrix and
>>> nodeID (and all data vectors) to be reordered as needed for whatever
>>> functions. Using the node ID, we can easily put everything back to
>>> the "default" phylobase order, BUT ONLY IF all objects (edge matrix,
>>> branch lengths, labels, etc etc are in the SAME order. Don't "break"
>>> the integrity of the object just for programming convenience.
>>> There is
>>> just too much danger for confusion. I, for one, would stop using
>>> phylobase, because it's just too hard to remember the
>>> peculiarities of
>>> the way the object is constructed. Everytime I wanted to do
>>> I'd have to relearn the rules.
>> Same for me.
> I've been working to try to make everything consistent in node order
> (as Steve suggested). Thibaut/Marguerite, what do you suggest for the
> case of unrooted trees? Thibaut, how often do you match up edges with
> data and labels?
> I've done a bunch of stuff, and I'd like to commit it, because it's
> all reasonably consistent now, but I'd like to hear some more
> conversation -- I'm willing to work back through while it's fresh in
> my mind and do everything the opposite way (keeping everything
> in edge-matrix order all the time), provided we know how to handle
> unrooted trees (and are willing to live with not being able to handle
I agree with Ben. I don't mind undoing what we did, but only if
there's a clear plan of exactly how we want things to work, spelled
out in detail, having thought about how it will work for unrooted and
rooted trees. I'm not arguing that the changes that were made to edge
and node numbering are the only way to go, or even the best, but they
are what we came up with to deal with the fact that when you add the
root edge to the edge matrix, unrooted trees broke most of the
existing code because they have an edge that is shared between two
nodes. I understand now why the ape developers kept the root edge out
of the edge matrix, it makes it complicated to deal with rooted and
unrooted trees using the same methods.
From Ben's example code...
> [,1] [,2]
> [1,] 5 6
> [2,] 6 1
> [3,] 6 2
> [4,] 5 3
> [5,] 5 4
> node 5 does not appear in the second (descendant) column
> of the edge matrix, so the node information has to be somewhat
> distinct from the edge information -- it's one unit longer.
> ape dealt with this by having root information (if any) hanging
> out in a separate place within the data structure, but we got
> rid of that ...
Say I want labels for nodes 5 and 6. Where do those labels go? i.e.
what does labels() look like for this edge matrix, and how do we
reorder this tree for plotting or traversal? What about after we root
this tree at node 5?
I could imagine a few ways to deal with this:
0) Undo. Revert to where we were a week ago - take the root edge out
of the edge matrix.
1) Do what we just did - put the root edge is in the edge matrix,
nodes have their own set of attributes, so do edges, and we write
accessors to translate between node id's and edges (transparent to end
user for rooted trees?).
2) Keep root in the edge matrix but split rooted and unrooted trees
into separate classes each with their own methods. I think this would
be more confusing for programmers and users, but we could basically
follow the ape tree structure for rooted trees and have a slightly
modified structure for unrooted trees.
3) Arbitrarily root unrooted trees at one of the nodes that share an
edge and strip out the imaginary root edge for printing and plotting
methods. This was an idea that Brian suggested. I'm not sure how hard
this would be to implement.
More information about the Phylobase-devl