[Phylobase-devl] Conference call minutes

Sat Mar 22 23:26:09 CET 2008

>> The problem with non-unique node labels happens when you try to  
>> create a phylo4d object, this will be fixed when we switch to using  
>> a vector/list of labels to identify nodes in place of the current  
>> use of the row.names of the data frame for node ID and data  
>> attachment:

> Yes, I realize this. That is why I was insisting earlier that  
> matching shouldn't be done on labels. Either they are labels for  
> convenience or they are not. We can't have it both ways. If users  
> want to match their data, then they should make sure that the data  
> are assigned to the proper node by providing the "node index" that  
> they match to. It would be a simple thing, so long as they can print  
> the node index and enter them into a spreadsheet with the data. It  
> is a lot easier, for example, than making sure that each species  
> name is spelled correctly in each dataset.
>
> Sometimes this is a big pain because you get one set of species  
> names from PAUP or whatever, but you have a different abbreviation  
> in your phenotypic data. Then all species must be renamed in one  
> dataset or the other.  It's a lot easier just to make sure that a  
> number matches.

Marguerite I don't fully understand this example.

For my own edification let me see if I understand what folks are  
thinking.  There are two aspects of a phylo object under discussion  
the node index and the node labels.  As far as node labels are  
concerned there is agreement that these need to be arbitrary with no  
restrictions on being unique, or even existing.

However, there is a discussion about node indices, and whether they  
should be enforced to be  an ordered vector from 1:Nnodes.

One argument for keeping them 1:Nnodes is that it is easier to iterate  
over the nodes this way.  I can see that, but looping for each in R is  
easy, is there an example where this would be difficult?

One argument for just using a number is that it become easier to  
compare trees, but this require expose the node indices to end users.   
I'm not sure this has much value for the end users who generally don't  
care much for the internal representation of the tree.  Is there a  
value to developers to have non consecutive node indices?

Steve's proposed solution to the tree comparison/tracking issue is to  
use node labels (not indices), this would require a richer node label  
model than the one currently implemented.  I think Steve has a node  
label data frame in mind.  That would allow unique node label  
information to sit next to potentially non-unique node label  
information.  This seems overly complex to me.  A phylo4D object  
already has a node data data frame, which has unique row names,  
perhaps this should be used instead?

>>
>>>> There was also a proposal to relax the restriction on node  
>>>> numbers being 1:length(nodes).
>>
>> I feel like we're mixing up what I am going to call node indexing  
>> and node labelling. Node indexing is purely for internal/ 
>> development purposes - currently nodes are indexed as 1:NNodes, all  
>> functions and methods can safely assume that they can iterate over  
>> nodes in this way, end users never need to think about these  
>> numbers unless they want to. Node labelling encompasses any other  
>> sort of data or identifier that you want to associate with a node,  
>> i.e. for end-users who want to be able to identify nodes that are  
>> the 'same node' across multiple trees, which could be implemented  
>> as actual node labels accessed via labels() or could be included as  
>> node data in a phylo4d object, since both labels and data persist  
>> across subset operations.

>>>> Pros:
>>>> Easier diffing of trees. For example, if I have a large tree of  
>>>> birds, but only have beak trait data for a subset and tarsus  
>>>> length for a different subset, comparing the two subsets is  
>>>> easier if the nodes are NOT renumbered.
>>
>>
>> If I understand the example, it sounds like what you want is a set  
>> of unique node labels on the large tree of birds that would allow  
>> an end-user to match nodes between subsequent subsets of the large  
>> tree:
>> intersect(labels(subTree1),labels(subTree2))
>>
>> I think this is a problem that is best solved by adding node labels  
>> to the large tree, not by changing the way nodes are indexed by all  
>> functions and methods in phylobase. It sounds like we do need a  
>> method to create unique node labels, either as labels() or phylo4d  
>> data, when users need them? I may just be missing the point of  
>> changing the way nodes are indexed, I think about this stuff as  
>> someone who writes functions that iterate over the nodes on a tree,  
>> which would be more complicated if nodes had arbitrary index numbers.
>>
>> Cheers,
>> Steve
>
> ____________________________________________
> Marguerite A. Butler
> Department of Zoology
> University of Hawaii
> 2538 McCarthy Mall, Edmondson 259
> Honolulu, HI  96822
>
> Phone: 808-956-4713
> FAX:   808-956-9812
> Dept: 808-956-8617
> http://www2.hawaii.edu/~mbutler
> http://www.hawaii.edu/zoology/
>
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl