[Phylobase-devl] Summer of Code ideas (was: a graphics challenge)

Steven Kembel steve.kembel at gmail.com
Fri Mar 20 00:03:35 CET 2009


Hi Hilmar,

We could post an unused idea from last year up on the wiki, it's  
general enough to accomodate a wide range of actual projects and  
touches on several of the specific ideas we've talked about (NCL,  
optimizing pruning/etc., data import/export). But perhaps because of  
that generality it didn't attract any interest last year and would  
need to be made more specific by anyone who was applying for it, or  
split into a few separate ideas before posting. Also, I really would  
have to be a secondary mentor this year, I cannot commit enough time  
to be a primary mentor. Here's a slightly modified version of the  
text, this or a modified version could be put up on the wiki shortly  
if it's useful, we'll need a list of potential mentors as well.

Optimizing phylogenetic data representation in R
Rationale
One result of the recent NESCent Hackathon on Comparative Methods in R  
has been the development of the phylobase package, which seeks to  
provide a set of S4 classes and methods for representing and  
manipulating phylogenetic trees and associated data in R. Phylobase  
contains structures for representing phylogenetic trees and associated  
data, but methods for tree manipulation, representation of multiple  
trees and metadata, and interfaces with other data formats (i.e.  
nexus, nexml) remain incomplete or have not been optimized for use  
with the large, multi-tree datasets that are increasingly common in  
bioinformatics and comparative biology. While the R language is  
extremely powerful and provides a rich feature set, it is inefficient  
at handling very large objects and heavy computational lifting (such  
as recursion, for-loops).
Approach
The methods for tree/data manipulation and import in phylobase are  
currently a mixture of S3 and S4 methods and C/C++ extensions. The  
goal for this project will be to implement efficient algorithms for  
tree and data representation and manipulation using object-oriented S4  
classes and methods, and C/C++ extensions where necessary for  
performance. We suggest focusing on methods such as tree pruning,  
subsetting, and manipulation of multiple tree objects that are  
currently incomplete and will have the greatest impact on the ability  
to work with very large trees and datasets.
It would also be useful to improve interfaces with other data formats  
such as nexus and nexml that will be the likely source for import of  
trees, data and metadata. This would require knowledge of C++ or XML.
Challenges
The general challenge for this project will be to identify and  
implement optimized data structures for trees, multi-trees, associated  
data, and metadata, and methods (e.g., pruning, subsetting of trees  
and data). Identifying and evaluating the critical bottlenecks will  
require profiling and testing of code. This project will likely  
require programming skills not only in R, but also in C/C++ or XML.
Involved toolkits or projects
R, phylobase, Nexus Class Library, NeXML, R XML package


On Mar 19, 2009, at 3:47 PM, Hilmar Lapp wrote:

> Guys - you have probably seen the announcement on the hackathon  
> list. How much more time do you want to have until I post this to r- 
> sig-phylo?
>
> If I do now, a student coming to the site must think that R projects  
> are not supported this year, and will likely walk away. I'll have to  
> post it soon, though, and the message is also going to come out on  
> EvolDir tonight or tomorrow night (I've just sent it).
>
> Peter - I'm probably repeating myself here, but if you want to  
> volunteer mentoring even though you're still a student that's a  
> great enough commitment to me. If you have an idea for one (or  
> several) project(s) that you would enjoy mentoring, you're more than  
> welcome to put it up on the Ideas page.
>
> That goes for everyone else on this list too, BTW - we may not get  
> enough slots to fund a battery of R projects, but if having a few  
> more things to choose from sparks more interest and more creativity,  
> then that's all good.
>
> 	-hilmar
>
> On Mar 18, 2009, at 3:09 PM, Peter Cowan wrote:
>
>> Steve et. al,
>>
>> I agree that it we shouldn't miss the opportunity to get a student on
>> the project.  I like Steve's ideas.  My personal favorite is a  
>> testing
>> framework, but I think that it's not really a good project for a
>> student.  I think a better project would be, the metadata support.  I
>> would see this project involving, writing and implementing a metadata
>> spec, and also updating the NCL integration (hopefully updating to  
>> the
>> latest version) to take advantage of metadata.  The improved NCL
>> integration would lead nicely into finishing the multiphylo stuff  
>> (not
>> sure what needs to be done here).
>>
>> If we structure the project like that, we have 3 distinct landmarks
>> that a student can work on.  Thus if we only get one or two done,
>> we've been successful.
>>
>> Cheer
>>
>> Peter
>>
>>
>>
>> On Mar 18, 2009, at 11:49 AM, Steven Kembel wrote:
>>
>>> Hi all,
>>>
>>> this reply is late but maybe just in time since NESCent was just
>>> accepted into the GSoC. Since it sounds like a parser for
>>> phylogenetic XML is going into NCL it might be useful to prepare
>>> phylobase to accept metadata gracefully, although this alone might
>>> not be a summer's worth of work, perhaps the project could be to
>>> fully integrate NCL into phylobase as well as modify the phylo4
>>> object to work with arbitrary metadata? Brian or others, is it
>>> possible to use the tree-parsing parts of NCL to allow reading of
>>> newick strings into phylobase directly? This would be useful. I also
>>> like the idea of writing some of the performance-sensitive methods
>>> in C, I have embarassingly not looked at the code in a while but i
>>> think some of the C code that we were using from ape (i.e. for
>>> pruning) may not work with the changes we made to tree structure?
>>> Tree rearrangement could fall into this category as well.
>>>
>>> Peter, all the ideas you suggested sound useful, did you have a
>>> favorite from that list?
>>>
>>> Steve
>>>
>>> On Mar 10, 2009, at 9:30 PM, Peter Cowan wrote:
>>>
>>>> As a former student myself, I'd be willing to help mentor this time
>>>> around.
>>>>
>>>> What types of projects would move phylobase forward?  A parser for
>>>> one
>>>> of the xml phylogeny formats?  Metadata support?  Multi-phylo4?
>>>> Tighter integration with NCL? Rewrites of performance sensitive
>>>> methods in C? A testing framework for the package?
>>>>
>>>> Other ideas?
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> On Mar 9, 2009, at 7:35 PM, Hilmar Lapp wrote:
>>>>
>>>>> I'd like to echo Brian's comment. You (phylobase) can have  
>>>>> students
>>>>> working over the summer for you; all you need to do is put up a
>>>>> project idea and designate mentor(s).
>>>>>
>>>>> I know that mentoring is also work, but the results can greatly  
>>>>> push
>>>>> along a project.
>>>>>
>>>>> Let me know if there's anything I can help with coordinating this.
>>>>>
>>>>> 	-hilmar
>>>>>
>>>>> On Mar 9, 2009, at 12:05 PM, Brian O'Meara wrote:
>>>>>
>>>>>> Good link, Ben. Google Summer of Code is starting up again, and
>>>>>> there
>>>>>> are no R projects yet on the NESCent page (<https://
>>>>>> www.nescent.org/
>>>>>> wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009>).
>>>>>> Perhaps a
>>>>>> way to charge the paddles?
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On Mar 9, 2009, at 2:07 AM, Ben Bolker wrote:
>>>>>>
>>>>>>>
>>>>>>> http://www.stat.columbia.edu/~cook/movabletype/archives/2009/03/
>>>>>>> more_on_display.html
>>>>>>>
>>>>>>> Someday soon I hope to get out the defibrillator and see if we
>>>>>>> can get phylobase going again ...
>>>>>>>
>>>>>>> cheers
>>>>>>> Ben
>>>>>>>
>>>>>>> -- 
>>>>>>> Ben Bolker
>>>>>>> Associate professor, Biology Dep't, Univ. of Florida
>>>>>>> bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>>>>> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Phylobase-devl mailing list
>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>>>>>> phylobase-devl
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> Brian O'Meara
>>>>>> NESCent
>>>>>> Durham, NC
>>>>>> http://www.brianomeara.info
>>>>>> ________________________________
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Phylobase-devl mailing list
>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>>>>
>>>>> -- 
>>>>> ===========================================================
>>>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>>>> ===========================================================
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Phylobase-devl mailing list
>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>>>
>>>> _______________________________________________
>>>> Phylobase-devl mailing list
>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>>
>>
>> _______________________________________________
>> Phylobase-devl mailing list
>> Phylobase-devl at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
>



More information about the Phylobase-devl mailing list