[Phylobase-devl] Planning the next release of phylobase
skembel at berkeley.edu
Fri Feb 22 20:24:05 CET 2008
>> 3. A final decision on how nodes will be referred to by other
I've been meaning to follow up on this for a while. We had a bunch of
discussion of how to represent node labels, whether to allow indexing
of nodes by a column in a data frame versus row.names, etc., but I
wasn't sure if there was a consensus. I would be happy to work on
this. I know Marguerite has started a pdata class and I will try to
build on that framework. Here is what I was imagining for a new way to
represent data for phylo4d objects. If people think this seems
reasonable I'll work on this.
1) I think indexing nodes by number is the easiest and most consistent
for internal purposes, since nodes will not always have unique user-
supplied labels. For the purposes of identify() and so on the
identification could either be the node number or the label if
present. I don't think we need to assign node labels automatically,
they can simply be blank if none are supplied.
2) Each node could have several pieces of associated data including
the metadata in pdata.R, as well as a 'label' that would not be
restricted to be unique or a valid row.name.
3) Summaries of the phylo4d data can return the data.frame along with
metadata and node labels as columns. This way the data can be accessed
either via pdata[i,j] or pdata["SomeSpp",] or whatever.
4) check_data should be modified to allow either data row.names or a
user-supplied vector/column of names to be used for matching data to
5) Leaving the phylo4d data as a data.frame with associated metadata/
labels would be the first step I'd suggest. Future modifications could
easily substitute some DNA data representation for a data.frame.
> Tracker is fine with me for now.
I'll put a copy of what I just said up on the tracker and see how that
> I could chat or phone next week if people want ...
> Another random question/opinion poll: what do people think about
> the names of the tree-walking functions? should it be
> getAncestors, getDescendants? Sons, Daughters? Leaves, branches?
"Descendants" could refer to immediately descendant nodes, or to all
leaves/tips descended from a node. I guess this could be part of a
single ancestor()/descendant() accessor with an argument to specify
what to return, but might be clearer if there were separate
ancestorNode, descendantNode, and descendantTips acccessors. I like
'tips' better than 'leaves' only because it's slightly less typing!
More information about the Phylobase-devl