[Phylobase-devl] where are we??

Wed Dec 31 04:27:32 CET 2008

On Dec 30, 2008, at 4:18 PM, Marguerite Butler wrote:
> Hi Guys,
>
> Ben -- I liked the new simulation example in the sweave doc!
>
> On Dec 30, 2008, at 10:29 AM, Ben Bolker wrote:
>
>> Steven Kembel wrote:
>>> Hi,
>>>
>>>> 1. merge my branch (with the aforementioned controversial
>>>> ordering)  {Peter, can you help with this if/when we
>>>> decide to go for it?}
>>> +1, prefer the named vector for labels in your branch.
>>>
>>> Isn't the controversial ordering already in the trunk? Related to
>>> ordering, I couldn't tell if there is a consensus on what to do  
>>> about
>>> edge matrix reordering. Marguerite and Thibaut seemed against edge
>>> matrix reordering vs. node ids, others pro or neutral.
>>
>>  I'm not sure there is consensus.  Thibaut seemed against it
>> but in his more detailed comments he seemed to accept edge
>> matrix reordering as long as everything was clear.
>>
> Go with what you think is best/most feasible.

+1 for merging, I'm happy to help.  Tomorrow I'm traveling, but the  
highlights:

Make sure that you have switched back to the trunk repository and you  
have the whole phylobase/ repository checked out and up to date.

cd phylobase/branches

-- Find the beginning of the branch

svn log --stop-on-copy newlabels/

-- For me this suggests 408 as the first revision in the branch
-- Then use --dry-run to see the output of the commands 
-- without making any actual changes.  The output should be any  
updates or 
-- changes you made on the newlabels/ branch.  Any "skipped" or 
-- "missing targets" is a problem.  "A", "U", "G" are all okay.
-- This command says take all the changes from revision 408 to the 
-- HEAD revision (the most current) and apply them to the trunk (aka / 
pkg)
svn merge -r 408:HEAD --dry-run newlabels/ ../pkg

-- Do it for real
svn merge -r 408:HEAD newlabels/ ../pkg
cd ../pkg

-- see what's changed, check that it makes sense
svn status

-- commit the changes
svn commit -m "my informative commit message about labels or somesuch"

-- delete the branch if we no longer need it
cd ..
svn rm branches/newlabels/
svn commit -m "my informative message about why I'm deleting this  
branch"

>>> For rooted trees, it's true that every node has an edge associated
>>> with
>>> it. Would reverting to the old reorder methods that keep nodes and
>>> edge
>>> matrix in same index order (edge matrix in node id order), create a
>>> lookup for edge/node order under reordering, instead of actually
>>> reordering the edge matrix, address some of the concerns raised by
>>> Marguerite and Thibaut about not mixing up edge and node ordering  
>>> for
>>> rooted trees?
>>
>> Maybe, although you can argue that solution is also confusing
>> (because there is a "real" ordering underneath that stays the same,
>> and another layer of indexing ...)
>
> I defer to you guys. Either actual reordering or lookup index is OK
> with me, I would just prefer not to have edge and other data in
> different orders (all should be the same). Seems like if you want to
> save the original "ape" or "nexus" order on input but reorder the
> tree, you'll need an index anyway. But maybe this is not important.

My attitude is that users will interact with the package through  
accessors.  We can write accessors to get the data out in anyway users  
want.  I see that as a key advantage of the package, users won't need  
to know what the internal structure looks like unless they really care  
(by which time they should be able to follow whatever our hair-brained  
decisions are as long as we document them).

Since we are discouraging the use of @tip.label in favor of  
tipLabels(phy) we can add an argument "order" or some better name.   
Then whenever I type tipLabels( on my mac it will remind me in the  
editor or console that I can get labels in different orders.

>>>> 2. see what we can do to detect & fix problems with unrooted trees:
>>>> this includes a lot of Steve's "to do" list (sorting out nodeId,
>>>> nNode, etc.)
>>>
>>> I can work on this once we decide what to do about unrooted trees, I
>>> think of the 3-4 options I suggested everyone voted for a different
>>> one.
>>> Do we want to try to add an imaginary root edge to unrooted trees
>>> that
>>> we strip out for printing, export, etc? i.e. we give one node in an
>>> unrooted tree an edge so every node has a unique edge (like in a
>>> rooted
>>> tree), this should allow most of the existing construction, check  
>>> and
>>> summary methods to work with unrooted trees? Other options were to
>>> have
>>> separate code for constructing, checking, printing, etc. of
>>> unrooted and
>>> rooted trees, or to actually create a different class for rooted vs.
>>> unrooted trees?
>>
>> I don't have strong feelings.  Marguerite was in favor of the
>> imaginary root edge.  I thought it seemed too complicated/hokey,
>> but if it makes things simpler I'm for it.
>
> :) The dropRoot function sounds really useful and should revert the
> edge structure to it's original form (making no need for new code).  I
> am still unclear whether there is any need for unrooted trees in
> comparative analysis, so I wouldn't recommend developing a whole new
> class and methods for something that has no clear need.

Like Marguerite, I don't know of many uses for unrooted trees.   
However, I see phylobase as intended for more than just comparative  
methods, rather for any R programming involving phylogenetic trees.   
For that reason, I think it's worth the effort to support unrooted  
trees.  However, from my eye, I don't see a critical flaw in the way  
we are handling unrooted trees.

>>>> 4.  PDC proposes that we change the NA in the root-node-row
>>>> to (-1) instead; I propose that we add a "dropRoot" function
>>>> (which just operates on a raw edge matrix, not on a phylo4
>>>> object) to abstract the operation of dropping the appropriate
>>>> row (essentially substituting for places where we have na.omit
>>>> or edge[!is.na(edge[,1]),] in the code now
>>>
>>> I'd prefer the dropRoot function vs. changing NA's to -1 in edge.
>>> dropRoot might also help if we go with adding an imaginary edge to
>>> unrooted trees, could just dropRoot whenever we want to summarize
>>> unrooted trees.
>>
>> I'm not sure what Peter's reasoning was, but he suggested that
>> -1 would be easier for some of his code to work with.

Yes, there are times when it is nice to have the root edge, but where  
the NA causes trouble.  As I recall the choice of NA was somewhat  
random.  Would there be problems with using -1?

e.g. This contrived example

require(phylobase)
myvec <- numeric(9)
foo <- as(rtree(5), 'phylo4')

index <- foo at edge[,1] == 7
myvec[index] <- index

>>>> 6. with some input from Emmanuel, implement SOME form of
>>>> checking/consistency rule for ordering when importing/exporting
>>>> from/to ape
>>>
>>> as(phylo,"phylo4") should keep the ape ordering of nodes/edges  
>>> during
>>> import (with insertion of root edge into phylo4 edge matrix).
>>> Once we make a decision about whether we want to keep direct
>>> reordering
>>> of the edge matrix, I think all that needs to be done for export to
>>> phylo is to make sure we strip out the root edge and then have
>>> nodes/edges follow the ape ordering of (tips,int.nodes) in edge?
>>
>> Well, the question here was whether we wanted to impose cladewise
>> ordering or not, which is in fact *not* (tips, int.nodes).  The ape
>> nodeId rules (which we do follow) are (tips, int.nodes); the ape
>> API <ape.mpl.ird.fr/misc/FormatTreeR_28July2008.pdf> says
>> "There is no mandatory order for the rows of edge, but they may be
>> arranged in a way that is efficient for computation and
>> manipulation.",
>> but I have some reason to suspect that's not true.  For example,
>> this code
>>
>> library(ape)
>> example("read.tree")
>> set.seed(1001)
>> tt <- tree.owls
>>
>> permtree <- function(phy) {
>> phy$edge <- phy$edge[sample(nrow(phy$edge)),]
>> phy
>> }
>>
>> for (i in 1:40) {
>> cat(i,"\n")
>> ttp <- permtree(tt)
>> plot(ttp)
>> }
>>
>> hangs R in an infinite loop on the second try ...
>> I don't know whether pruningwise (or anything other
>> than cladewise will do it) ...
>>
>> Arguably this is a bug in ape which Emmanuel should
>> fix ...
>
> It would be great if this bug got fixed.

I'm biased, but I'd like to keep the current approach of reorder the  
edge matrix. I too hope this is changed soon.

[snip]

Defer everything else...

Peter