[Phylobase-devl] labeling order

Fri Dec 26 05:16:37 CET 2008

  Hmmm.

  For the record, here's Steve's statement:

SWK - This is crucial and we should decide soon, needs to be sorted
out for 1.5. I think that many of the problems we're having with
labels and reordering are due to the fact that until now we treated
nodes and edges as interchangable. i.e. we had node labels in edge
matrix order, but these labels should really be associated with
nodes, not with edges. This assumption caused things to break once edges
 and nodes were not equivalent (now that root edge is in the edge matrix
and we allow edge matrix reordering, or for unrooted trees). I
think we need to be very clear about whether methods are actually
operating on nodes or edges.
I suggest that edge, edge.labels and edge.lengths (branch lengths)
are in 'edge' order. Everything else (node labels, tip labels) should
be in node id order. nodeId can translate between these two orders.
Reorder can act on the edge* only since the underlying node ids
will not change.

Francois: It's definitely a crucial issue. Perhaps we could track
node.labels and tip.labels by using named vectors, the names of the
vector would be the nodeId.

Marguerite:
This one is very important, and I think it's a very bad idea to unlink
the edges and nodes. Edges and nodes are intimately linked. In my
mind, the edge is simply the branch below the node. So to have edges
in one order and nodes in another order makes no sense to me at all.
Why don't we simply give node ID's in "edge" order as you are using
it? otherwise, there is HUGE potential for confusion. And we would
need yet another index that indicates a mapping of the node ID to the
edge matrix.

Instead, why don't we just decide on a standard ordering for phylobase
number the node ID's in this way, and then allow the edge matrix and
nodeID (and all data vectors) to be reordered as needed for whatever
functions.  Using the node ID, we can easily  put everything back to
the "default" phylobase order, BUT ONLY IF all objects (edge matrix,
branch lengths, labels, etc etc are in the SAME order. Don't "break"
the integrity of the object just for programming convenience. There is
just too much danger for confusion. I, for one, would stop using
phylobase, because it's just too hard to remember the peculiarities of
the way the object is constructed. Everytime I wanted to do something,
I'd have to relearn the rules.
> >

=================
me (again, mostly responding to MB):

  * perhaps we should indeed be stricter about node numbering
(e.g. insist that in addition to satisfying the [1:m=tips,
m+1=root (if any), (m+1):(m+n) = nodes] rule from ape,
also insist that node numbers be in order in a cladewise ordering,
but I think that's a side issue.

  * Steve's point, I think, is that once we have to deal with
unrooted trees (for example) there is a mismatch between node
information and edge information whatever we do: for example,
in

 unroot(tree.owls)$edge
     [,1] [,2]
[1,]    5    6
[2,]    6    1
[3,]    6    2
[4,]    5    3
[5,]    5    4

  node 5 does not appear in the second (descendant) column
of the edge matrix, so the node information has to be somewhat
distinct from the edge information -- it's one unit longer.
ape dealt with this by having root information (if any) hanging
out in a separate place within the data structure, but we got
rid of that ...

  another place where the correspondence breaks down would
be with reticulations (where there would be extra edges in the
edge matrix) [it would be nice to keep the flexibility, although
it would also be reasonable to say that so few methods will work
with reticulated trees that we just shouldn't worry about it].

  I may be biased because I've already started to work on this, but ...
I don't think it's as bad as MB thinks.  We were (or at least I was)
having a really hard time sorting out when labels were appearing in
node ID order, and when they were appearing in edge matrix order.
There are a lot of places where it makes it easier NOT to re-sort
the labels and data every time the edge matrix changes (reorder, etc.).
I guess the good news is that there are relatively few places that
the edge matrix actually has to be matched up against the labels
and data ...
  What can we do to convince you that this is not so bad?
Can Steve rephrase his arguments for why this is the right way?

  Ben

Marguerite Butler wrote:
> Hi Guys,
> 
> Not sure what has happened, and I realize I may not understand. But I
> don't think it's a good idea to have different positional orders for
> edge matrix, labels, ID's, etc. I think that would make it really
> difficult for programmers to understand what is going on. Please see my
> response in the longer thread "vote for arguments"
> 
> Merry Christmas.
> Marguerite
> 

-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc