[Phylobase-devl] where are we??

Tue Dec 30 21:29:16 CET 2008

Steven Kembel wrote:
> Hi,
> 
>> 1. merge my branch (with the aforementioned controversial
>> ordering)  {Peter, can you help with this if/when we
>> decide to go for it?}
> +1, prefer the named vector for labels in your branch.
> 
> Isn't the controversial ordering already in the trunk? Related to
> ordering, I couldn't tell if there is a consensus on what to do about
> edge matrix reordering. Marguerite and Thibaut seemed against edge
> matrix reordering vs. node ids, others pro or neutral.

   I'm not sure there is consensus.  Thibaut seemed against it
but in his more detailed comments he seemed to accept edge
matrix reordering as long as everything was clear.

> For rooted trees, it's true that every node has an edge associated with
> it. Would reverting to the old reorder methods that keep nodes and edge
> matrix in same index order (edge matrix in node id order), create a
> lookup for edge/node order under reordering, instead of actually
> reordering the edge matrix, address some of the concerns raised by
> Marguerite and Thibaut about not mixing up edge and node ordering for
> rooted trees?

  Maybe, although you can argue that solution is also confusing
(because there is a "real" ordering underneath that stays the same,
and another layer of indexing ...)
> 
>> 2. see what we can do to detect & fix problems with unrooted trees:
>> this includes a lot of Steve's "to do" list (sorting out nodeId,
>> nNode, etc.)
> 
> I can work on this once we decide what to do about unrooted trees, I
> think of the 3-4 options I suggested everyone voted for a different one.
> Do we want to try to add an imaginary root edge to unrooted trees that
> we strip out for printing, export, etc? i.e. we give one node in an
> unrooted tree an edge so every node has a unique edge (like in a rooted
> tree), this should allow most of the existing construction, check and
> summary methods to work with unrooted trees? Other options were to have
> separate code for constructing, checking, printing, etc. of unrooted and
> rooted trees, or to actually create a different class for rooted vs.
> unrooted trees?

  I don't have strong feelings.  Marguerite was in favor of the
imaginary root edge.  I thought it seemed too complicated/hokey,
but if it makes things simpler I'm for it.
> 
>> 4.  PDC proposes that we change the NA in the root-node-row
>> to (-1) instead; I propose that we add a "dropRoot" function
>> (which just operates on a raw edge matrix, not on a phylo4
>> object) to abstract the operation of dropping the appropriate
>> row (essentially substituting for places where we have na.omit
>> or edge[!is.na(edge[,1]),] in the code now
> 
> I'd prefer the dropRoot function vs. changing NA's to -1 in edge.
> dropRoot might also help if we go with adding an imaginary edge to
> unrooted trees, could just dropRoot whenever we want to summarize
> unrooted trees.

  I'm not sure what Peter's reasoning was, but he suggested that
-1 would be easier for some of his code to work with.
> 
>> 6. with some input from Emmanuel, implement SOME form of
>> checking/consistency rule for ordering when importing/exporting
>> from/to ape
> 
> as(phylo,"phylo4") should keep the ape ordering of nodes/edges during
> import (with insertion of root edge into phylo4 edge matrix).
> Once we make a decision about whether we want to keep direct reordering
> of the edge matrix, I think all that needs to be done for export to
> phylo is to make sure we strip out the root edge and then have
> nodes/edges follow the ape ordering of (tips,int.nodes) in edge?

  Well, the question here was whether we wanted to impose cladewise
ordering or not, which is in fact *not* (tips, int.nodes).  The ape
nodeId rules (which we do follow) are (tips, int.nodes); the ape
API <ape.mpl.ird.fr/misc/FormatTreeR_28July2008.pdf> says
"There is no mandatory order for the rows of edge, but they may be
arranged in a way that is efficient for computation and manipulation.",
but I have some reason to suspect that's not true.  For example, this code

library(ape)
example("read.tree")
set.seed(1001)
tt <- tree.owls

permtree <- function(phy) {
  phy$edge <- phy$edge[sample(nrow(phy$edge)),]
  phy
}

for (i in 1:40) {
  cat(i,"\n")
  ttp <- permtree(tt)
  plot(ttp)
}

  hangs R in an infinite loop on the second try ...
  I don't know whether pruningwise (or anything other
than cladewise will do it) ...

  Arguably this is a bug in ape which Emmanuel should
fix ...

> 
>> I propose that we DELAY:
>> 2. adding metadata/annotation slots, although if we know
>> their GENERAL form it would be nice to add them to phylo4[d]
>> objects now because adding slots later breaks backward compatibility
>> of saved objects (but we may just have to bite the bullet and
>> do it later).  In particular if these were slots of type list()
>> we could be vague about what we were going to put in (at the
>> cost of less-strong typing of the objects)
> 
> Hilmar had suggested that these slots could be tag/value lists. If we
> add @metadata and @annotation slots of type list for now, can we enforce
> checking of tags later once we know what metadata and annotations will
> look like?

  If they are defined as type "list" then R won't do any checking
within them for validity -- that gives us a way out.

> 
> Cheers,
> Steve

-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc