[Phylobase-devl] unification of tree data slots

Ben Bolker bolker at ufl.edu
Mon Sep 21 23:09:20 CEST 2009


 I would lean towards just going for it and cleaning it up as we go.
 It's a development version, after all.  Merging branches always seems a
little scary/hairy.

   Ben

Jim Regetz wrote:
> Okay, I think the responses to this proposal ranged from somewhat 
> hesitant to definitely supportive, with center of mass somewhere on the 
> positive side of neutral :)
> 
> I think it's worth giving this a shot. And because it would (I believe) 
> cleanly fix some existing bugs/buglets that I'd rather not patch up with 
> workarounds, I'd prefer to try it now.
> 
> Perhaps a branch is in order? I think the changes can be implemented 
> without too much pain, but it would be nice to know I/we can commit 
> partial changes if need be, without worrying about passing package check 
> with every commit.
> 
> Please let me know if you don't think I captured the group sentiment, or 
> if you have other reactions/thoughts.
> 
> Thanks!
> Jim
> 
> Ben Bolker wrote:
>>   Agreed.  I think the only concern is the "changing things around"
>> issue.  I'm OK with the idea that if people have node data for just a
>> few nodes, then they have to pay the cost of storing NAs for all the
>> rest.  I am much happier with the "changing things around" plan now that
>> we are starting to have a halfway-decent testing framework so that we
>> can be slightly more certain that we're not f*cking everything up by
>> making changes ...
>>
>>    So I'd say I'm a +0 -- I'm not going to argue against it, but I won't
>> do the work either :-)
>>    At some point I *do* want to get back into helping develop, but I
>> can't even afford the time to get back up to speed about the current
>> status ...
>>
>>   cheers
>>     Ben
>>
>> Steven Kembel wrote:
>>> Hello,
>>>
>>> I've been out of the loop for a while but wanted to quickly say that  
>>> reworking the labels/data to be a single slot and letting accessors  
>>> deal with making it look consistent sounds good. IIRC the main  
>>> argument previously against a single tip/node data slot was the  
>>> storage space issue (i.e. when I load a phylogeny with 20K tips I  
>>> don't want to unnecesarily store node data if it doesn't exist) but it  
>>> sounds like this is no longer an issue since the data are not stored  
>>> if they don't exist?
>>>
>>> Cheers,
>>> Steve
>>>
>>> On Sep 17, 2009, at 11:54 AM, Jim Regetz wrote:
>>>
>>>> Quick reply just about the labels question:
>>>>
>>>> Peter Cowan wrote:
>>>>>>> On Wed, 2009-09-16 at 15:17 -0700, Jim Regetz wrote:
>>>>>>>> Addendum: In case anyone else's mind happens to wander in this
>>>>>>>> direction, yes, I think a similar argument could be made for
>>>>>>>> combining the slots for tip and internal _labels_ into a single
>>>>>>>> label slot, because each label is now unambiguously identified
>>>>>>>> by its name (node ID). Seems like the separation is a
>>>>>>>> historical artifact? Combining them would simplify the
>>>>>>>> corresponding accessor/replace methods, which currently have to
>>>>>>>> look conditionally in either tip.label or node.label depending
>>>>>>>> on the arguments. And it wouldn't be hard at all to make this
>>>>>>>> change in the code base. Of course, I'm not going to ask for
>>>>>>>> the moon *and* the stars, but if someone else proposed it... :)
>>>>>>>>
>>>>> Again, I think performance was the reason here.  The assumption that
>>>>> more often than not trees will not have any internal node labels.
>>>> That doesn't have to be a problem. What I said about tree data applies
>>>> even more clearly here: only labels that actually exist need to be in
>>>> the vector. So if you only supply tip labels when you create the tree,
>>>> the (unified) label slot would be exactly the same as what we now call
>>>> tip.label. Example with a 3-tip tree:
>>>>
>>>> ## actual slot contents -- no internal labels stored
>>>>> phy at label
>>>>    1    2    3
>>>> "t2" "t1" "t3"
>>>>
>>>> ## but the accessors would still "fill in" implied the NAs:
>>>>> labels(phy) ## default type is 'all'
>>>>    1    2    3    4    5
>>>> "t2" "t1" "t3"   NA   NA
>>>>
>>>>> tipLabels(phy)
>>>>    1    2    3
>>>> "t2" "t1" "t3"
>>>>
>>>>> nodeLabels(phy)
>>>>  4  5
>>>> NA NA
>>>>
>>>> ## now add internal labels
>>>>> nodeLabels(phy) <- c("n4", "n5")
>>>>> phy at label
>>>>    1    2    3    4    5
>>>> "t2" "t1" "t3" "n4" "n5"
>>>>
>>>> ## and remove them again!
>>>>> nodeLabels(phy) <- as.character(NA)
>>>>> phy at label
>>>>    1    2    3
>>>> "t2" "t1" "t3"
>>>>
>>>> I just quickly wrote up new accessor and replace methods that would
>>>> behave this way. As illustrated above, the replacement method will  
>>>> also
>>>> drop any NA labels it encounters, for efficiency (but obviously  
>>>> attempts
>>>> to do this for tip labels will produce an error).
>>>>
>>>> Jim
>>>> _______________________________________________
>>>> Phylobase-devl mailing list
>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>


-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
Url : http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20090921/3fd8d824/attachment.pgp 


More information about the Phylobase-devl mailing list