[Phylobase-devl] Summer of Code ideas (was: a graphics challenge)
Hilmar Lapp
hlapp at duke.edu
Sat Mar 21 00:13:33 CET 2009
I've put this up now.
Peter - since you are the volunteering mentor, unless someone else
volunteers to mentor it, feel free to modify it to your hearts
content (including rewriting it completely :-)
-hilmar
On Mar 19, 2009, at 7:03 PM, Steven Kembel wrote:
> Hi Hilmar,
>
> We could post an unused idea from last year up on the wiki, it's
> general enough to accomodate a wide range of actual projects and
> touches on several of the specific ideas we've talked about (NCL,
> optimizing pruning/etc., data import/export). But perhaps because
> of that generality it didn't attract any interest last year and
> would need to be made more specific by anyone who was applying for
> it, or split into a few separate ideas before posting. Also, I
> really would have to be a secondary mentor this year, I cannot
> commit enough time to be a primary mentor. Here's a slightly
> modified version of the text, this or a modified version could be
> put up on the wiki shortly if it's useful, we'll need a list of
> potential mentors as well.
>
> Optimizing phylogenetic data representation in R
> Rationale
> One result of the recent NESCent Hackathon on Comparative Methods
> in R has been the development of the phylobase package, which seeks
> to provide a set of S4 classes and methods for representing and
> manipulating phylogenetic trees and associated data in R. Phylobase
> contains structures for representing phylogenetic trees and
> associated data, but methods for tree manipulation, representation
> of multiple trees and metadata, and interfaces with other data
> formats (i.e. nexus, nexml) remain incomplete or have not been
> optimized for use with the large, multi-tree datasets that are
> increasingly common in bioinformatics and comparative biology.
> While the R language is extremely powerful and provides a rich
> feature set, it is inefficient at handling very large objects and
> heavy computational lifting (such as recursion, for-loops).
> Approach
> The methods for tree/data manipulation and import in phylobase are
> currently a mixture of S3 and S4 methods and C/C++ extensions. The
> goal for this project will be to implement efficient algorithms for
> tree and data representation and manipulation using object-oriented
> S4 classes and methods, and C/C++ extensions where necessary for
> performance. We suggest focusing on methods such as tree pruning,
> subsetting, and manipulation of multiple tree objects that are
> currently incomplete and will have the greatest impact on the
> ability to work with very large trees and datasets.
> It would also be useful to improve interfaces with other data
> formats such as nexus and nexml that will be the likely source for
> import of trees, data and metadata. This would require knowledge of
> C++ or XML.
> Challenges
> The general challenge for this project will be to identify and
> implement optimized data structures for trees, multi-trees,
> associated data, and metadata, and methods (e.g., pruning,
> subsetting of trees and data). Identifying and evaluating the
> critical bottlenecks will require profiling and testing of code.
> This project will likely require programming skills not only in R,
> but also in C/C++ or XML.
> Involved toolkits or projects
> R, phylobase, Nexus Class Library, NeXML, R XML package
>
>
> On Mar 19, 2009, at 3:47 PM, Hilmar Lapp wrote:
>
>> Guys - you have probably seen the announcement on the hackathon
>> list. How much more time do you want to have until I post this to
>> r-sig-phylo?
>>
>> If I do now, a student coming to the site must think that R
>> projects are not supported this year, and will likely walk away.
>> I'll have to post it soon, though, and the message is also going
>> to come out on EvolDir tonight or tomorrow night (I've just sent it).
>>
>> Peter - I'm probably repeating myself here, but if you want to
>> volunteer mentoring even though you're still a student that's a
>> great enough commitment to me. If you have an idea for one (or
>> several) project(s) that you would enjoy mentoring, you're more
>> than welcome to put it up on the Ideas page.
>>
>> That goes for everyone else on this list too, BTW - we may not get
>> enough slots to fund a battery of R projects, but if having a few
>> more things to choose from sparks more interest and more
>> creativity, then that's all good.
>>
>> -hilmar
>>
>> On Mar 18, 2009, at 3:09 PM, Peter Cowan wrote:
>>
>>> Steve et. al,
>>>
>>> I agree that it we shouldn't miss the opportunity to get a
>>> student on
>>> the project. I like Steve's ideas. My personal favorite is a
>>> testing
>>> framework, but I think that it's not really a good project for a
>>> student. I think a better project would be, the metadata
>>> support. I
>>> would see this project involving, writing and implementing a
>>> metadata
>>> spec, and also updating the NCL integration (hopefully updating
>>> to the
>>> latest version) to take advantage of metadata. The improved NCL
>>> integration would lead nicely into finishing the multiphylo stuff
>>> (not
>>> sure what needs to be done here).
>>>
>>> If we structure the project like that, we have 3 distinct landmarks
>>> that a student can work on. Thus if we only get one or two done,
>>> we've been successful.
>>>
>>> Cheer
>>>
>>> Peter
>>>
>>>
>>>
>>> On Mar 18, 2009, at 11:49 AM, Steven Kembel wrote:
>>>
>>>> Hi all,
>>>>
>>>> this reply is late but maybe just in time since NESCent was just
>>>> accepted into the GSoC. Since it sounds like a parser for
>>>> phylogenetic XML is going into NCL it might be useful to prepare
>>>> phylobase to accept metadata gracefully, although this alone might
>>>> not be a summer's worth of work, perhaps the project could be to
>>>> fully integrate NCL into phylobase as well as modify the phylo4
>>>> object to work with arbitrary metadata? Brian or others, is it
>>>> possible to use the tree-parsing parts of NCL to allow reading of
>>>> newick strings into phylobase directly? This would be useful. I
>>>> also
>>>> like the idea of writing some of the performance-sensitive methods
>>>> in C, I have embarassingly not looked at the code in a while but i
>>>> think some of the C code that we were using from ape (i.e. for
>>>> pruning) may not work with the changes we made to tree structure?
>>>> Tree rearrangement could fall into this category as well.
>>>>
>>>> Peter, all the ideas you suggested sound useful, did you have a
>>>> favorite from that list?
>>>>
>>>> Steve
>>>>
>>>> On Mar 10, 2009, at 9:30 PM, Peter Cowan wrote:
>>>>
>>>>> As a former student myself, I'd be willing to help mentor this
>>>>> time
>>>>> around.
>>>>>
>>>>> What types of projects would move phylobase forward? A parser for
>>>>> one
>>>>> of the xml phylogeny formats? Metadata support? Multi-phylo4?
>>>>> Tighter integration with NCL? Rewrites of performance sensitive
>>>>> methods in C? A testing framework for the package?
>>>>>
>>>>> Other ideas?
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> On Mar 9, 2009, at 7:35 PM, Hilmar Lapp wrote:
>>>>>
>>>>>> I'd like to echo Brian's comment. You (phylobase) can have
>>>>>> students
>>>>>> working over the summer for you; all you need to do is put up a
>>>>>> project idea and designate mentor(s).
>>>>>>
>>>>>> I know that mentoring is also work, but the results can
>>>>>> greatly push
>>>>>> along a project.
>>>>>>
>>>>>> Let me know if there's anything I can help with coordinating
>>>>>> this.
>>>>>>
>>>>>> -hilmar
>>>>>>
>>>>>> On Mar 9, 2009, at 12:05 PM, Brian O'Meara wrote:
>>>>>>
>>>>>>> Good link, Ben. Google Summer of Code is starting up again, and
>>>>>>> there
>>>>>>> are no R projects yet on the NESCent page (<https://
>>>>>>> www.nescent.org/
>>>>>>> wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009>).
>>>>>>> Perhaps a
>>>>>>> way to charge the paddles?
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> On Mar 9, 2009, at 2:07 AM, Ben Bolker wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> http://www.stat.columbia.edu/~cook/movabletype/archives/
>>>>>>>> 2009/03/
>>>>>>>> more_on_display.html
>>>>>>>>
>>>>>>>> Someday soon I hope to get out the defibrillator and see if we
>>>>>>>> can get phylobase going again ...
>>>>>>>>
>>>>>>>> cheers
>>>>>>>> Ben
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ben Bolker
>>>>>>>> Associate professor, Biology Dep't, Univ. of Florida
>>>>>>>> bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>>>>>> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Phylobase-devl mailing list
>>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>>>>>>> phylobase-devl
>>>>>>>
>>>>>>>
>>>>>>> ________________________________
>>>>>>> Brian O'Meara
>>>>>>> NESCent
>>>>>>> Durham, NC
>>>>>>> http://www.brianomeara.info
>>>>>>> ________________________________
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Phylobase-devl mailing list
>>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>>>>>> phylobase-devl
>>>>>>
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Phylobase-devl mailing list
>>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>>>>> phylobase-devl
>>>>>
>>>>> _______________________________________________
>>>>> Phylobase-devl mailing list
>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>>>> phylobase-devl
>>>>
>>>
>>> _______________________________________________
>>> Phylobase-devl mailing list
>>> Phylobase-devl at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/
>>> phylobase-devl
>>
>> --
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>>
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu :
===========================================================
More information about the Phylobase-devl
mailing list