[Phylobase-devl] Planning the next release of phylobase

Steve Kembel skembel at berkeley.edu
Fri Feb 22 20:24:05 CET 2008


Hi all,

>> 3. A final decision on how nodes will be referred to by other  
>> commands.

I've been meaning to follow up on this for a while. We had a bunch of  
discussion of how to represent node labels, whether to allow indexing  
of nodes by a column in a data frame versus row.names, etc., but I  
wasn't sure if there was a consensus. I would be happy to work on  
this. I know Marguerite has started a pdata class and I will try to  
build on that framework. Here is what I was imagining for a new way to  
represent data for phylo4d objects. If people think this seems  
reasonable I'll work on this.
1) I think indexing nodes by number is the easiest and most consistent  
for internal purposes, since nodes will not always have unique user- 
supplied labels. For the purposes of identify() and so on the  
identification could either be the node number or the label if  
present. I don't think we need to assign node labels automatically,  
they can simply be blank if none are supplied.
2) Each node could have several pieces of associated data including  
the metadata in pdata.R, as well as a 'label' that would not be  
restricted to be unique or a valid row.name.
3) Summaries of the phylo4d data can return the data.frame along with  
metadata and node labels as columns. This way the data can be accessed  
either via pdata[i,j] or pdata["SomeSpp",] or whatever.
4) check_data should be modified to allow either data row.names or a  
user-supplied vector/column of names to be used for matching data to  
tree.
5) Leaving the phylo4d data as a data.frame with associated metadata/ 
labels would be the first step I'd suggest. Future modifications could  
easily substitute some DNA data representation for a data.frame.

>  Tracker is fine with me for now.

I'll put a copy of what I just said up on the tracker and see how that  
works.

>  I could chat or phone next week if people want ...

Same here.

>  Another random question/opinion poll:  what do people think about
> the names of the tree-walking functions?  should it be
> getAncestors, getDescendants?  Sons, Daughters? Leaves, branches?

"Descendants" could refer to immediately descendant nodes, or to all  
leaves/tips descended from a node. I guess this could be part of a  
single ancestor()/descendant() accessor with an argument to specify  
what to return, but might be clearer if there were separate  
ancestorNode, descendantNode, and descendantTips acccessors. I like  
'tips' better than 'leaves' only because it's slightly less typing!

Steve


More information about the Phylobase-devl mailing list