[Phylobase-devl] Summer of Code ideas (was: a graphics challenge)

Peter Cowan pdc at berkeley.edu
Sat Mar 21 00:15:22 CET 2009

Hilmar and Steve,

Apologizes for not chiming in sooner.  I'll try to write up the ideas  
that Steve and I were talking about as soon as possible.

Hilmar, thanks for putting that up and me as the potential mentor, and  
hopefully we'll replace or add content that is more specific.

Do I need to be on a special mentors list?


On Mar 20, 2009, at 4:13 PM, Hilmar Lapp wrote:

> I've put this up now.
> Peter - since you are the volunteering mentor, unless someone else  
> volunteers to mentor it, feel free to modify it to your hearts  
> content (including rewriting it completely :-)
> 	-hilmar
> On Mar 19, 2009, at 7:03 PM, Steven Kembel wrote:
>> Hi Hilmar,
>> We could post an unused idea from last year up on the wiki, it's  
>> general enough to accomodate a wide range of actual projects and  
>> touches on several of the specific ideas we've talked about (NCL,  
>> optimizing pruning/etc., data import/export). But perhaps because  
>> of that generality it didn't attract any interest last year and  
>> would need to be made more specific by anyone who was applying for  
>> it, or split into a few separate ideas before posting. Also, I  
>> really would have to be a secondary mentor this year, I cannot  
>> commit enough time to be a primary mentor. Here's a slightly  
>> modified version of the text, this or a modified version could be  
>> put up on the wiki shortly if it's useful, we'll need a list of  
>> potential mentors as well.
>> Optimizing phylogenetic data representation in R
>> Rationale
>> One result of the recent NESCent Hackathon on Comparative Methods  
>> in R has been the development of the phylobase package, which seeks  
>> to provide a set of S4 classes and methods for representing and  
>> manipulating phylogenetic trees and associated data in R. Phylobase  
>> contains structures for representing phylogenetic trees and  
>> associated data, but methods for tree manipulation, representation  
>> of multiple trees and metadata, and interfaces with other data  
>> formats (i.e. nexus, nexml) remain incomplete or have not been  
>> optimized for use with the large, multi-tree datasets that are  
>> increasingly common in bioinformatics and comparative biology.  
>> While the R language is extremely powerful and provides a rich  
>> feature set, it is inefficient at handling very large objects and  
>> heavy computational lifting (such as recursion, for-loops).
>> Approach
>> The methods for tree/data manipulation and import in phylobase are  
>> currently a mixture of S3 and S4 methods and C/C++ extensions. The  
>> goal for this project will be to implement efficient algorithms for  
>> tree and data representation and manipulation using object-oriented  
>> S4 classes and methods, and C/C++ extensions where necessary for  
>> performance. We suggest focusing on methods such as tree pruning,  
>> subsetting, and manipulation of multiple tree objects that are  
>> currently incomplete and will have the greatest impact on the  
>> ability to work with very large trees and datasets.
>> It would also be useful to improve interfaces with other data  
>> formats such as nexus and nexml that will be the likely source for  
>> import of trees, data and metadata. This would require knowledge of  
>> C++ or XML.
>> Challenges
>> The general challenge for this project will be to identify and  
>> implement optimized data structures for trees, multi-trees,  
>> associated data, and metadata, and methods (e.g., pruning,  
>> subsetting of trees and data). Identifying and evaluating the  
>> critical bottlenecks will require profiling and testing of code.  
>> This project will likely require programming skills not only in R,  
>> but also in C/C++ or XML.
>> Involved toolkits or projects
>> R, phylobase, Nexus Class Library, NeXML, R XML package
>> On Mar 19, 2009, at 3:47 PM, Hilmar Lapp wrote:
>>> Guys - you have probably seen the announcement on the hackathon  
>>> list. How much more time do you want to have until I post this to  
>>> r-sig-phylo?
>>> If I do now, a student coming to the site must think that R  
>>> projects are not supported this year, and will likely walk away.  
>>> I'll have to post it soon, though, and the message is also going  
>>> to come out on EvolDir tonight or tomorrow night (I've just sent  
>>> it).
>>> Peter - I'm probably repeating myself here, but if you want to  
>>> volunteer mentoring even though you're still a student that's a  
>>> great enough commitment to me. If you have an idea for one (or  
>>> several) project(s) that you would enjoy mentoring, you're more  
>>> than welcome to put it up on the Ideas page.
>>> That goes for everyone else on this list too, BTW - we may not get  
>>> enough slots to fund a battery of R projects, but if having a few  
>>> more things to choose from sparks more interest and more  
>>> creativity, then that's all good.
>>> 	-hilmar
>>> On Mar 18, 2009, at 3:09 PM, Peter Cowan wrote:
>>>> Steve et. al,
>>>> I agree that it we shouldn't miss the opportunity to get a  
>>>> student on
>>>> the project.  I like Steve's ideas.  My personal favorite is a  
>>>> testing
>>>> framework, but I think that it's not really a good project for a
>>>> student.  I think a better project would be, the metadata  
>>>> support.  I
>>>> would see this project involving, writing and implementing a  
>>>> metadata
>>>> spec, and also updating the NCL integration (hopefully updating  
>>>> to the
>>>> latest version) to take advantage of metadata.  The improved NCL
>>>> integration would lead nicely into finishing the multiphylo stuff  
>>>> (not
>>>> sure what needs to be done here).
>>>> If we structure the project like that, we have 3 distinct landmarks
>>>> that a student can work on.  Thus if we only get one or two done,
>>>> we've been successful.
>>>> Cheer
>>>> Peter
>>>> On Mar 18, 2009, at 11:49 AM, Steven Kembel wrote:
>>>>> Hi all,
>>>>> this reply is late but maybe just in time since NESCent was just
>>>>> accepted into the GSoC. Since it sounds like a parser for
>>>>> phylogenetic XML is going into NCL it might be useful to prepare
>>>>> phylobase to accept metadata gracefully, although this alone might
>>>>> not be a summer's worth of work, perhaps the project could be to
>>>>> fully integrate NCL into phylobase as well as modify the phylo4
>>>>> object to work with arbitrary metadata? Brian or others, is it
>>>>> possible to use the tree-parsing parts of NCL to allow reading of
>>>>> newick strings into phylobase directly? This would be useful. I  
>>>>> also
>>>>> like the idea of writing some of the performance-sensitive methods
>>>>> in C, I have embarassingly not looked at the code in a while but i
>>>>> think some of the C code that we were using from ape (i.e. for
>>>>> pruning) may not work with the changes we made to tree structure?
>>>>> Tree rearrangement could fall into this category as well.
>>>>> Peter, all the ideas you suggested sound useful, did you have a
>>>>> favorite from that list?
>>>>> Steve
>>>>> On Mar 10, 2009, at 9:30 PM, Peter Cowan wrote:
>>>>>> As a former student myself, I'd be willing to help mentor this  
>>>>>> time
>>>>>> around.
>>>>>> What types of projects would move phylobase forward?  A parser  
>>>>>> for
>>>>>> one
>>>>>> of the xml phylogeny formats?  Metadata support?  Multi-phylo4?
>>>>>> Tighter integration with NCL? Rewrites of performance sensitive
>>>>>> methods in C? A testing framework for the package?
>>>>>> Other ideas?
>>>>>> Peter
>>>>>> On Mar 9, 2009, at 7:35 PM, Hilmar Lapp wrote:
>>>>>>> I'd like to echo Brian's comment. You (phylobase) can have  
>>>>>>> students
>>>>>>> working over the summer for you; all you need to do is put up a
>>>>>>> project idea and designate mentor(s).
>>>>>>> I know that mentoring is also work, but the results can  
>>>>>>> greatly push
>>>>>>> along a project.
>>>>>>> Let me know if there's anything I can help with coordinating  
>>>>>>> this.
>>>>>>> 	-hilmar
>>>>>>> On Mar 9, 2009, at 12:05 PM, Brian O'Meara wrote:
>>>>>>>> Good link, Ben. Google Summer of Code is starting up again, and
>>>>>>>> there
>>>>>>>> are no R projects yet on the NESCent page (<https://
>>>>>>>> www.nescent.org/
>>>>>>>> wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009>).
>>>>>>>> Perhaps a
>>>>>>>> way to charge the paddles?
>>>>>>>> Brian
>>>>>>>> On Mar 9, 2009, at 2:07 AM, Ben Bolker wrote:
>>>>>>>>> http://www.stat.columbia.edu/~cook/movabletype/archives/2009/03/
>>>>>>>>> more_on_display.html
>>>>>>>>> Someday soon I hope to get out the defibrillator and see if we
>>>>>>>>> can get phylobase going again ...
>>>>>>>>> cheers
>>>>>>>>> Ben
>>>>>>>>> -- 
>>>>>>>>> Ben Bolker
>>>>>>>>> Associate professor, Biology Dep't, Univ. of Florida
>>>>>>>>> bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>>>>>>> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>>>>>>> _______________________________________________
>>>>>>>>> Phylobase-devl mailing list
>>>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>>>>>>>> phylobase-devl
>>>>>>>> ________________________________
>>>>>>>> Brian O'Meara
>>>>>>>> NESCent
>>>>>>>> Durham, NC
>>>>>>>> http://www.brianomeara.info
>>>>>>>> ________________________________
>>>>>>>> _______________________________________________
>>>>>>>> Phylobase-devl mailing list
>>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>>>>>> -- 
>>>>>>> ===========================================================
>>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>>>>>> ===========================================================
>>>>>>> _______________________________________________
>>>>>>> Phylobase-devl mailing list
>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>>>>> _______________________________________________
>>>>>> Phylobase-devl mailing list
>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>>> _______________________________________________
>>>> Phylobase-devl mailing list
>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>> ===========================================================
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================

More information about the Phylobase-devl mailing list