[Phylobase-devl] Summer of Code ideas (was: a graphics challenge)

Hilmar Lapp hlapp at duke.edu
Sat Mar 21 00:13:33 CET 2009


I've put this up now.

Peter - since you are the volunteering mentor, unless someone else  
volunteers to mentor it, feel free to modify it to your hearts  
content (including rewriting it completely :-)

	-hilmar

On Mar 19, 2009, at 7:03 PM, Steven Kembel wrote:

> Hi Hilmar,
>
> We could post an unused idea from last year up on the wiki, it's  
> general enough to accomodate a wide range of actual projects and  
> touches on several of the specific ideas we've talked about (NCL,  
> optimizing pruning/etc., data import/export). But perhaps because  
> of that generality it didn't attract any interest last year and  
> would need to be made more specific by anyone who was applying for  
> it, or split into a few separate ideas before posting. Also, I  
> really would have to be a secondary mentor this year, I cannot  
> commit enough time to be a primary mentor. Here's a slightly  
> modified version of the text, this or a modified version could be  
> put up on the wiki shortly if it's useful, we'll need a list of  
> potential mentors as well.
>
> Optimizing phylogenetic data representation in R
> Rationale
> One result of the recent NESCent Hackathon on Comparative Methods  
> in R has been the development of the phylobase package, which seeks  
> to provide a set of S4 classes and methods for representing and  
> manipulating phylogenetic trees and associated data in R. Phylobase  
> contains structures for representing phylogenetic trees and  
> associated data, but methods for tree manipulation, representation  
> of multiple trees and metadata, and interfaces with other data  
> formats (i.e. nexus, nexml) remain incomplete or have not been  
> optimized for use with the large, multi-tree datasets that are  
> increasingly common in bioinformatics and comparative biology.  
> While the R language is extremely powerful and provides a rich  
> feature set, it is inefficient at handling very large objects and  
> heavy computational lifting (such as recursion, for-loops).
> Approach
> The methods for tree/data manipulation and import in phylobase are  
> currently a mixture of S3 and S4 methods and C/C++ extensions. The  
> goal for this project will be to implement efficient algorithms for  
> tree and data representation and manipulation using object-oriented  
> S4 classes and methods, and C/C++ extensions where necessary for  
> performance. We suggest focusing on methods such as tree pruning,  
> subsetting, and manipulation of multiple tree objects that are  
> currently incomplete and will have the greatest impact on the  
> ability to work with very large trees and datasets.
> It would also be useful to improve interfaces with other data  
> formats such as nexus and nexml that will be the likely source for  
> import of trees, data and metadata. This would require knowledge of  
> C++ or XML.
> Challenges
> The general challenge for this project will be to identify and  
> implement optimized data structures for trees, multi-trees,  
> associated data, and metadata, and methods (e.g., pruning,  
> subsetting of trees and data). Identifying and evaluating the  
> critical bottlenecks will require profiling and testing of code.  
> This project will likely require programming skills not only in R,  
> but also in C/C++ or XML.
> Involved toolkits or projects
> R, phylobase, Nexus Class Library, NeXML, R XML package
>
>
> On Mar 19, 2009, at 3:47 PM, Hilmar Lapp wrote:
>
>> Guys - you have probably seen the announcement on the hackathon  
>> list. How much more time do you want to have until I post this to  
>> r-sig-phylo?
>>
>> If I do now, a student coming to the site must think that R  
>> projects are not supported this year, and will likely walk away.  
>> I'll have to post it soon, though, and the message is also going  
>> to come out on EvolDir tonight or tomorrow night (I've just sent it).
>>
>> Peter - I'm probably repeating myself here, but if you want to  
>> volunteer mentoring even though you're still a student that's a  
>> great enough commitment to me. If you have an idea for one (or  
>> several) project(s) that you would enjoy mentoring, you're more  
>> than welcome to put it up on the Ideas page.
>>
>> That goes for everyone else on this list too, BTW - we may not get  
>> enough slots to fund a battery of R projects, but if having a few  
>> more things to choose from sparks more interest and more  
>> creativity, then that's all good.
>>
>> 	-hilmar
>>
>> On Mar 18, 2009, at 3:09 PM, Peter Cowan wrote:
>>
>>> Steve et. al,
>>>
>>> I agree that it we shouldn't miss the opportunity to get a  
>>> student on
>>> the project.  I like Steve's ideas.  My personal favorite is a  
>>> testing
>>> framework, but I think that it's not really a good project for a
>>> student.  I think a better project would be, the metadata  
>>> support.  I
>>> would see this project involving, writing and implementing a  
>>> metadata
>>> spec, and also updating the NCL integration (hopefully updating  
>>> to the
>>> latest version) to take advantage of metadata.  The improved NCL
>>> integration would lead nicely into finishing the multiphylo stuff  
>>> (not
>>> sure what needs to be done here).
>>>
>>> If we structure the project like that, we have 3 distinct landmarks
>>> that a student can work on.  Thus if we only get one or two done,
>>> we've been successful.
>>>
>>> Cheer
>>>
>>> Peter
>>>
>>>
>>>
>>> On Mar 18, 2009, at 11:49 AM, Steven Kembel wrote:
>>>
>>>> Hi all,
>>>>
>>>> this reply is late but maybe just in time since NESCent was just
>>>> accepted into the GSoC. Since it sounds like a parser for
>>>> phylogenetic XML is going into NCL it might be useful to prepare
>>>> phylobase to accept metadata gracefully, although this alone might
>>>> not be a summer's worth of work, perhaps the project could be to
>>>> fully integrate NCL into phylobase as well as modify the phylo4
>>>> object to work with arbitrary metadata? Brian or others, is it
>>>> possible to use the tree-parsing parts of NCL to allow reading of
>>>> newick strings into phylobase directly? This would be useful. I  
>>>> also
>>>> like the idea of writing some of the performance-sensitive methods
>>>> in C, I have embarassingly not looked at the code in a while but i
>>>> think some of the C code that we were using from ape (i.e. for
>>>> pruning) may not work with the changes we made to tree structure?
>>>> Tree rearrangement could fall into this category as well.
>>>>
>>>> Peter, all the ideas you suggested sound useful, did you have a
>>>> favorite from that list?
>>>>
>>>> Steve
>>>>
>>>> On Mar 10, 2009, at 9:30 PM, Peter Cowan wrote:
>>>>
>>>>> As a former student myself, I'd be willing to help mentor this  
>>>>> time
>>>>> around.
>>>>>
>>>>> What types of projects would move phylobase forward?  A parser for
>>>>> one
>>>>> of the xml phylogeny formats?  Metadata support?  Multi-phylo4?
>>>>> Tighter integration with NCL? Rewrites of performance sensitive
>>>>> methods in C? A testing framework for the package?
>>>>>
>>>>> Other ideas?
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> On Mar 9, 2009, at 7:35 PM, Hilmar Lapp wrote:
>>>>>
>>>>>> I'd like to echo Brian's comment. You (phylobase) can have  
>>>>>> students
>>>>>> working over the summer for you; all you need to do is put up a
>>>>>> project idea and designate mentor(s).
>>>>>>
>>>>>> I know that mentoring is also work, but the results can  
>>>>>> greatly push
>>>>>> along a project.
>>>>>>
>>>>>> Let me know if there's anything I can help with coordinating  
>>>>>> this.
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On Mar 9, 2009, at 12:05 PM, Brian O'Meara wrote:
>>>>>>
>>>>>>> Good link, Ben. Google Summer of Code is starting up again, and
>>>>>>> there
>>>>>>> are no R projects yet on the NESCent page (<https://
>>>>>>> www.nescent.org/
>>>>>>> wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009>).
>>>>>>> Perhaps a
>>>>>>> way to charge the paddles?
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> On Mar 9, 2009, at 2:07 AM, Ben Bolker wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> http://www.stat.columbia.edu/~cook/movabletype/archives/ 
>>>>>>>> 2009/03/
>>>>>>>> more_on_display.html
>>>>>>>>
>>>>>>>> Someday soon I hope to get out the defibrillator and see if we
>>>>>>>> can get phylobase going again ...
>>>>>>>>
>>>>>>>> cheers
>>>>>>>> Ben
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Ben Bolker
>>>>>>>> Associate professor, Biology Dep't, Univ. of Florida
>>>>>>>> bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>>>>>> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Phylobase-devl mailing list
>>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>>>>>>> phylobase-devl
>>>>>>>
>>>>>>>
>>>>>>> ________________________________
>>>>>>> Brian O'Meara
>>>>>>> NESCent
>>>>>>> Durham, NC
>>>>>>> http://www.brianomeara.info
>>>>>>> ________________________________
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Phylobase-devl mailing list
>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>>>>>>> phylobase-devl
>>>>>>
>>>>>> -- 
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Phylobase-devl mailing list
>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>>>>>> phylobase-devl
>>>>>
>>>>> _______________________________________________
>>>>> Phylobase-devl mailing list
>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>>>>> phylobase-devl
>>>>
>>>
>>> _______________________________________________
>>> Phylobase-devl mailing list
>>> Phylobase-devl at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>>> phylobase-devl
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================






More information about the Phylobase-devl mailing list