[Phylobase-devl] Conference call minutes

Ben Bolker bolker at zoo.ufl.edu
Mon Mar 24 22:49:21 CET 2008


Peter Cowan wrote:
> 
> On Mar 23, 2008, at 3:19 PM, Ben Bolker wrote:
>>>> For my own edification let me see if I understand what folks are
>>>> thinking.  There are two aspects of a phylo object under discussion
>>>> the node index and the node labels.  As far as node labels are
>>>> concerned there is agreement that these need to be arbitrary with no
>>>> restrictions on being unique, or even existing.
>>>>
>>>> However, there is a discussion about node indices, and whether they
>>>> should be enforced to be  an ordered vector from 1:Nnodes.
>>>>
>>>> One argument for keeping them 1:Nnodes is that it is easier to iterate
>>>> over the nodes this way.  I can see that, but looping for each in R is
>>>> easy, is there an example where this would be difficult?
>>>>
>>>> One argument for just using a number is that it become easier to
>>>> compare trees, but this require expose the node indices to end users.
>>>> I'm not sure this has much value for the end users who generally don't
>>>> care much for the internal representation of the tree.  Is there a
>>>> value to developers to have non consecutive node indices?
>>>>
>>>> Steve's proposed solution to the tree comparison/tracking issue is to
>>>> use node labels (not indices), this would require a richer node label
>>>> model than the one currently implemented.  I think Steve has a node
>>>> label data frame in mind.  That would allow unique node label
>>>> information to sit next to potentially non-unique node label
>>>> information.  This seems overly complex to me.  A phylo4D object
>>>> already has a node data data frame, which has unique row names,
>>>> perhaps this should be used instead?
>>
>> Back to where we started -- we can't do this because **data frame
>> row names are required to be unique**.
> 
> This will teach me to flippantly throw out an idea with out thinking 
> about it carefully, I'm not sure it would even work.  However, the idea 
> was, that if we are looking for a unique index by which to compare two 
> trees derived from a larger tree, then the row names of the node data 
> data frame should show the differences, because row names are retained 
> after subsetting.
> 

   If you look at prune.R, you'll see that when pruning (which is exactly
such a case where we need to keep track of correspondence, so that we
can drop the appropriate node data), we create a temporary set of
"tags" to use in matching before vs. after -- assign them to
the rownames of the node data -- and use them to subset the node data.

    This strategy would work generally -- if we spent a lot of time
generating such tags I guess I could see a point to saving them
internally, but I don't think this will change the external API.
So we would only need to tell people about them if we wanted developers
to be able to use them.

   Looking at prune.R makes me a little nervous that node labels aren't
getting handled correctly, but I would want to check carefully before
I went in there and started breaking stuff ...

    ben



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20080324/3d079505/attachment-0001.pgp 


More information about the Phylobase-devl mailing list