[Phylobase-devl] unification of tree data slots

Peter Cowan pdc at berkeley.edu
Mon Sep 21 23:47:40 CEST 2009


On Sep 21, 2009, at 2:09 PM, Ben Bolker wrote:

> I would lean towards just going for it and cleaning it up as we go.
> It's a development version, after all.  Merging branches always  
> seems a
> little scary/hairy.
>
>   Ben
>
> Jim Regetz wrote:
>> Okay, I think the responses to this proposal ranged from somewhat
>> hesitant to definitely supportive, with center of mass somewhere on  
>> the
>> positive side of neutral :)
>>
>> I think it's worth giving this a shot. And because it would (I  
>> believe)
>> cleanly fix some existing bugs/buglets that I'd rather not patch up  
>> with
>> workarounds, I'd prefer to try it now.

I agree the opinions seemed to range from +0 to +1 so I say lets do  
it.  I can help out if there's need.

>> Perhaps a branch is in order? I think the changes can be implemented
>> without too much pain, but it would be nice to know I/we can commit
>> partial changes if need be, without worrying about passing package  
>> check
>> with every commit.

I'd say in this case a branch is a wise idea, because it will make it  
easier to compare the benefits/compromises of the unified data.

Peter

>> Please let me know if you don't think I captured the group  
>> sentiment, or
>> if you have other reactions/thoughts.
>>
>> Thanks!
>> Jim
>>
>> Ben Bolker wrote:
>>>  Agreed.  I think the only concern is the "changing things around"
>>> issue.  I'm OK with the idea that if people have node data for  
>>> just a
>>> few nodes, then they have to pay the cost of storing NAs for all the
>>> rest.  I am much happier with the "changing things around" plan  
>>> now that
>>> we are starting to have a halfway-decent testing framework so that  
>>> we
>>> can be slightly more certain that we're not f*cking everything up by
>>> making changes ...
>>>
>>>   So I'd say I'm a +0 -- I'm not going to argue against it, but I  
>>> won't
>>> do the work either :-)
>>>   At some point I *do* want to get back into helping develop, but I
>>> can't even afford the time to get back up to speed about the current
>>> status ...
>>>
>>>  cheers
>>>    Ben
>>>
>>> Steven Kembel wrote:
>>>> Hello,
>>>>
>>>> I've been out of the loop for a while but wanted to quickly say  
>>>> that
>>>> reworking the labels/data to be a single slot and letting accessors
>>>> deal with making it look consistent sounds good. IIRC the main
>>>> argument previously against a single tip/node data slot was the
>>>> storage space issue (i.e. when I load a phylogeny with 20K tips I
>>>> don't want to unnecesarily store node data if it doesn't exist)  
>>>> but it
>>>> sounds like this is no longer an issue since the data are not  
>>>> stored
>>>> if they don't exist?
>>>>
>>>> Cheers,
>>>> Steve
>>>>
>>>> On Sep 17, 2009, at 11:54 AM, Jim Regetz wrote:
>>>>
>>>>> Quick reply just about the labels question:
>>>>>
>>>>> Peter Cowan wrote:
>>>>>>>> On Wed, 2009-09-16 at 15:17 -0700, Jim Regetz wrote:
>>>>>>>>> Addendum: In case anyone else's mind happens to wander in this
>>>>>>>>> direction, yes, I think a similar argument could be made for
>>>>>>>>> combining the slots for tip and internal _labels_ into a  
>>>>>>>>> single
>>>>>>>>> label slot, because each label is now unambiguously identified
>>>>>>>>> by its name (node ID). Seems like the separation is a
>>>>>>>>> historical artifact? Combining them would simplify the
>>>>>>>>> corresponding accessor/replace methods, which currently have  
>>>>>>>>> to
>>>>>>>>> look conditionally in either tip.label or node.label depending
>>>>>>>>> on the arguments. And it wouldn't be hard at all to make this
>>>>>>>>> change in the code base. Of course, I'm not going to ask for
>>>>>>>>> the moon *and* the stars, but if someone else proposed  
>>>>>>>>> it... :)
>>>>>>>>>
>>>>>> Again, I think performance was the reason here.  The assumption  
>>>>>> that
>>>>>> more often than not trees will not have any internal node labels.
>>>>> That doesn't have to be a problem. What I said about tree data  
>>>>> applies
>>>>> even more clearly here: only labels that actually exist need to  
>>>>> be in
>>>>> the vector. So if you only supply tip labels when you create the  
>>>>> tree,
>>>>> the (unified) label slot would be exactly the same as what we  
>>>>> now call
>>>>> tip.label. Example with a 3-tip tree:
>>>>>
>>>>> ## actual slot contents -- no internal labels stored
>>>>>> phy at label
>>>>>   1    2    3
>>>>> "t2" "t1" "t3"
>>>>>
>>>>> ## but the accessors would still "fill in" implied the NAs:
>>>>>> labels(phy) ## default type is 'all'
>>>>>   1    2    3    4    5
>>>>> "t2" "t1" "t3"   NA   NA
>>>>>
>>>>>> tipLabels(phy)
>>>>>   1    2    3
>>>>> "t2" "t1" "t3"
>>>>>
>>>>>> nodeLabels(phy)
>>>>> 4  5
>>>>> NA NA
>>>>>
>>>>> ## now add internal labels
>>>>>> nodeLabels(phy) <- c("n4", "n5")
>>>>>> phy at label
>>>>>   1    2    3    4    5
>>>>> "t2" "t1" "t3" "n4" "n5"
>>>>>
>>>>> ## and remove them again!
>>>>>> nodeLabels(phy) <- as.character(NA)
>>>>>> phy at label
>>>>>   1    2    3
>>>>> "t2" "t1" "t3"
>>>>>
>>>>> I just quickly wrote up new accessor and replace methods that  
>>>>> would
>>>>> behave this way. As illustrated above, the replacement method will
>>>>> also
>>>>> drop any NA labels it encounters, for efficiency (but obviously
>>>>> attempts
>>>>> to do this for tip labels will produce an error).
>>>>>
>>>>> Jim
>>>>> _______________________________________________
>>>>> Phylobase-devl mailing list
>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>>
>
>
> -- 
> Ben Bolker
> Associate professor, Biology Dep't, Univ. of Florida
> bolker at ufl.edu / www.zoology.ufl.edu/bolker
> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>



More information about the Phylobase-devl mailing list