[Phylobase-devl] several proposed additions

Thu Jun 25 02:45:38 CEST 2009

Followup responses/questions throughout, below:

Ben Bolker wrote:
> Hilmar Lapp wrote:
>> Hi Jim -
>>
>> this sounds great! I can't speak on behalf of the phylobase leads, but  
>> can you commit all your changes to a branch? That would make it  
>> easiest for people to check out and play with your code. The unit  
>> tests sound cool - haven't been aware of this, but if it is what the  
>> name suggests it will be highly useful.
>>
>> 	-hilmar

Hilmar: I was thinking of this too, but the changes are not as extensive 
as they seem at first blush. The namespace is just a single file that we 
can easily take out if it causes hassles. The tests don't affect the 
functioning of the package in any way, so no harm there. And I've tried 
to ensure that the phylo4 methods changes don't change any existing 
behavior other than how it handles numeric node labels (and this needed 
to be massaged in any case). Trunk commit should be fine -- agreed?

>> On Jun 24, 2009, at 3:45 AM, Jim Regetz wrote:
>>>
>>> * NAMESPACE: I've created one. Seems to work fine. I'm basically
>>> exporting all functions and methods now. I figure it's easier to start
>>> with a liberal approach, and later remove whatever items folks agree  
>>> are
>>> only required internally.
> 
>   We've had a lot of arguments about whether to use namespaces or
> not (I have been very frustrated by namespaces in the past, partly
> because I don't really understand them), but I don't think it's
> a bad idea.
> 

Ben: Did your frustration involve creating namespaces, or working with 
packages that use them? I acknowledge one issue is that it complicates 
on-the-fly modification of functions within an R session, which can be a 
little annoying when debugging/developing: e.g., if you use fix() to 
modify function getNode in an R session, other exported phylobase 
functions will nevertheless continue to use the original getNode 
definition. But using fixInNamespace() or just sourcing the pkg/R 
directory when developing can help.

An advantage is that other packages can leverage phylobase by flexibly 
importing its methods/generics. For example, I'm working on new package 
that depends on phylobase. To use subset("phylo4", ...), I currently 
have to specify phylobase:::subset() in my code because it's otherwise 
not treated as an S4 generic. However, if subset is exported in the 
phylobase NAMESPACE, I can import it in my package's NAMESPACE and 
method dispatch works as expected.

Also FYI, I think the namespace will also obviate the need for the line 
of code in methods-phylo4.R that masks ape::nTips, because nTips as 
defined in phylobase will always take precedence? (I'm going out on a 
limb a little bit here, because I'm not quite sure what the masking 
problem was in the first place :)

And of course, there are the standard benefits of hiding internal 
functions, and knowing that functions will work right regardless of what 
synonymous objects users create in their global environment or load via 
other packages.

Perhaps these are somewhat minor benefits in isolation, but I think it's 
the right way to go given that phylobase is intended to serve as a 
foundation package that more specialized phylo packages will depend on.

>>> * Unit tests: I've created a handful of RUnit tests, and added the
>>> necessary "run tests" machinery as documented here:
>>> http://wiki.r-project.org/rwiki/doku.php?id=developers:runit
>>>
>>> The tests are not even remotely comprehensive, but they do test  
>>> against
>>> the bug fixes I've implemented in the last week or so. I find unit  
>>> tests
>>> to be hugely helpful in making sure that fixed bugs stay fixed, and  
>>> that
>>> code changes in one area don't unexpectedly break things somewhere  
>>> else.
>>> And I find RUnit testing to provide better structure and test
>>> expressivity than you get with unstructured test scripts in the pkg/ 
>>> test
>>> directory.
>>>
> 
>   sounds great.

OK I'll commit them when I get a chance. Worse case scenario is that 
they break the R-forge check; as standalone tests, they certainly don't 
affect package functioning in anyway.

>>> I'm not 100% certain that R-forge would do the right thing with these
>>> tests without some additional tweaking, but they are successfully  
>>> run by
>>> R CMD CHECK on my local machine (at least as long as phylobase is
>>> already installed).
>>>
>>> * phylo4 generic: Any reason why there has been a generic for phylo4d,
>>> but not phylo4? I've now created a phylo4 generic, along with two
>>> methods: one with signature x="matrix" (replacing the original phylo4
>>> constructor function), and one with signature x="phylo" (imports phylo
>>> to phylo4). I updated the associated doc pages to match, plus fixed a
>>> few places in the examples and sources where phylo(edge=...) was  
>>> called
>>> explicitly (the first argument is now the more generic 'x', not  
>>> 'edge').
>>>
>>> * Dealing with messy node labels: My motivation for creating the  
>>> phylo4
>>> generic was to expose an option allowing node labels to be dropped  
>>> (or,
>>> in the case of phylo4d, added to the tree data) during phylo import.
>>> This is basically just a modified implementation of what François  
>>> did to
>>> deal with MrBayes sorts of trees, except now focused specifically on
>>> phylo import (it could be added elsewhere too), and now with user- 
>>> level
>>> control of the behavior. I'm glossing over the details, but for those
>>> not paying attention to the bug tracker, see more discussion here:
>>> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=466&group_id=111&atid=488
>>>
>>> FYI, R CMD CHECK passes on all the above changes (well, aside from a
>>> latex documentation issue that I think has been around for some time).
>>>
>>> I'll be traveling much of the next month, but would be happy to  
>>> discuss
>>> any of this further. And of course, I can push some/all of this up to
>>> the repository at any time, or send relevant code to the list if  
>>> requested.
>>>
> 
>   Please do push.  We need to move ahead.  I want to spend more time
> working this summer (maybe with Francois), but probably won't happen
> until second half of July ...

Great! François, can I commit these phylo4/phylo4d changes without 
disrupting you? I prefer to deal with this before the namespace and unit 
tests, because they both reflect the local changes I've made to 
phylo4/phylo4d.

To summarize, dropping node labels will become an optional feature of 
both phylo4(x="phylo") (which is itself a new method) and 
phylo4d(x="phylo"), with the latter also allowing the node labels to be 
preserved as tree data. In the labels-as-data case, the code currently 
uses the same as.numeric() call that you had, resulting in NAs if you 
try to convert non-number-like node labels (e.g., "a", "b") to data. But 
it would be easy enough to add smarter handling of this later if we're 
happy with the overall approach.

Out of curiosity, in the short term do we want this node label dropping 
option to exist in places beyond just ape tree import functions?

Thanks!
Jim

>   Ben
>