[Phylobase-devl] unification of tree data slots

Jim Regetz regetz at nceas.ucsb.edu
Mon Sep 21 23:05:49 CEST 2009


Okay, I think the responses to this proposal ranged from somewhat 
hesitant to definitely supportive, with center of mass somewhere on the 
positive side of neutral :)

I think it's worth giving this a shot. And because it would (I believe) 
cleanly fix some existing bugs/buglets that I'd rather not patch up with 
workarounds, I'd prefer to try it now.

Perhaps a branch is in order? I think the changes can be implemented 
without too much pain, but it would be nice to know I/we can commit 
partial changes if need be, without worrying about passing package check 
with every commit.

Please let me know if you don't think I captured the group sentiment, or 
if you have other reactions/thoughts.

Thanks!
Jim

Ben Bolker wrote:
>   Agreed.  I think the only concern is the "changing things around"
> issue.  I'm OK with the idea that if people have node data for just a
> few nodes, then they have to pay the cost of storing NAs for all the
> rest.  I am much happier with the "changing things around" plan now that
> we are starting to have a halfway-decent testing framework so that we
> can be slightly more certain that we're not f*cking everything up by
> making changes ...
> 
>    So I'd say I'm a +0 -- I'm not going to argue against it, but I won't
> do the work either :-)
>    At some point I *do* want to get back into helping develop, but I
> can't even afford the time to get back up to speed about the current
> status ...
> 
>   cheers
>     Ben
> 
> Steven Kembel wrote:
>> Hello,
>>
>> I've been out of the loop for a while but wanted to quickly say that  
>> reworking the labels/data to be a single slot and letting accessors  
>> deal with making it look consistent sounds good. IIRC the main  
>> argument previously against a single tip/node data slot was the  
>> storage space issue (i.e. when I load a phylogeny with 20K tips I  
>> don't want to unnecesarily store node data if it doesn't exist) but it  
>> sounds like this is no longer an issue since the data are not stored  
>> if they don't exist?
>>
>> Cheers,
>> Steve
>>
>> On Sep 17, 2009, at 11:54 AM, Jim Regetz wrote:
>>
>>> Quick reply just about the labels question:
>>>
>>> Peter Cowan wrote:
>>>>>> On Wed, 2009-09-16 at 15:17 -0700, Jim Regetz wrote:
>>>>>>> Addendum: In case anyone else's mind happens to wander in this
>>>>>>> direction, yes, I think a similar argument could be made for
>>>>>>> combining the slots for tip and internal _labels_ into a single
>>>>>>> label slot, because each label is now unambiguously identified
>>>>>>> by its name (node ID). Seems like the separation is a
>>>>>>> historical artifact? Combining them would simplify the
>>>>>>> corresponding accessor/replace methods, which currently have to
>>>>>>> look conditionally in either tip.label or node.label depending
>>>>>>> on the arguments. And it wouldn't be hard at all to make this
>>>>>>> change in the code base. Of course, I'm not going to ask for
>>>>>>> the moon *and* the stars, but if someone else proposed it... :)
>>>>>>>
>>>> Again, I think performance was the reason here.  The assumption that
>>>> more often than not trees will not have any internal node labels.
>>> That doesn't have to be a problem. What I said about tree data applies
>>> even more clearly here: only labels that actually exist need to be in
>>> the vector. So if you only supply tip labels when you create the tree,
>>> the (unified) label slot would be exactly the same as what we now call
>>> tip.label. Example with a 3-tip tree:
>>>
>>> ## actual slot contents -- no internal labels stored
>>>> phy at label
>>>    1    2    3
>>> "t2" "t1" "t3"
>>>
>>> ## but the accessors would still "fill in" implied the NAs:
>>>> labels(phy) ## default type is 'all'
>>>    1    2    3    4    5
>>> "t2" "t1" "t3"   NA   NA
>>>
>>>> tipLabels(phy)
>>>    1    2    3
>>> "t2" "t1" "t3"
>>>
>>>> nodeLabels(phy)
>>>  4  5
>>> NA NA
>>>
>>> ## now add internal labels
>>>> nodeLabels(phy) <- c("n4", "n5")
>>>> phy at label
>>>    1    2    3    4    5
>>> "t2" "t1" "t3" "n4" "n5"
>>>
>>> ## and remove them again!
>>>> nodeLabels(phy) <- as.character(NA)
>>>> phy at label
>>>    1    2    3
>>> "t2" "t1" "t3"
>>>
>>> I just quickly wrote up new accessor and replace methods that would
>>> behave this way. As illustrated above, the replacement method will  
>>> also
>>> drop any NA labels it encounters, for efficiency (but obviously  
>>> attempts
>>> to do this for tip labels will produce an error).
>>>
>>> Jim
>>> _______________________________________________
>>> Phylobase-devl mailing list
>>> Phylobase-devl at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
> 
> 


More information about the Phylobase-devl mailing list