[Phylobase-devl] Phylobase GSoC idea

Steve Kembel skembel at berkeley.edu
Tue Mar 11 04:47:11 CET 2008


Hi Ben,

Sounds like the mentoring time commitment is around 5 hours a week for  
12 weeks, no time on site - all done remotely:
http://code.google.com/soc/2008/faqs.html

I've started to rework the idea I proposed earlier and am wondering if  
this is something people think is worthwhile - off the top of my head  
the specific things I was thinking need optimization/work would be  
pruning, subsetting, finishing the multiPhylo4 implementation... none  
very glamourous (no acronyms involved!) but necessary, and most are  
currently either unfinished or we're using the ape code, so every time  
we do it we're converting phylo4 -> ape -> phylo4.

Another option related to the ioNCL stuff and phylo4d would be  
finishing the pdata implementation - i.e. tree + data + metadata, or  
even linking the pdata to some data standard like nexml or phyloxml  
(which I know very little about). Don't know if all of this could be  
crammed into a single idea or not.

Steve

On Mar 10, 2008, at 8:39 PM, Ben Bolker wrote:

>  Hear, hear.
>
>  It would also be fun to figure out if this could be
> integrated into the "grammar of graphics" as developed
> for R by Hadley Wickham.  I wasn't paying attention -- how
> much work is mentoring, and how much time has to be
> spent on-site?
>
>  Ben
>
>
> Hilmar Lapp wrote:
>> This sounds great! -hilmar
>> On Mar 10, 2008, at 8:38 PM, Peter Cowan wrote:
>>> Here is another idea, this one in need of a mentor.  Thibaut, Ben?
>>>
>>> Peter
>>>
>>> Rationale
>>>
>>> Methods for visualizing phylogenetic trees and associated data have
>>> not kept pace with growth in tree or dataset size.  At a recent
>>> NESCent hackathon a new R package for comparative methods,  
>>> phylobase,
>>> was designed.  The phylobase package uses modern R classes and  
>>> methods
>>> to store and manipulate phylogenetic trees.  Currently plotting of
>>> phylogenetic trees in phylobase relies on the R base graphics, and
>>> functions in other packages.  These implementations are not well
>>> suited to displaying either large phylogenetic trees or trees with
>>> large amounts of associated data.
>>>
>>> Approach
>>>
>>> The R programming has two primary plot device interfaces, the base
>>> graphics interface and the newer, more extensible grid system.  The
>>> grid system allows for a much more flexible system which will allow
>>> for consistent scaling and resizing of trees and data.  Current plot
>>> methods in phylobase are based on base graphics and suffer from
>>> resizing and layout difficulties.  This project will develop new  
>>> plot
>>> methods based on the grid system.
>>>
>>> Challenges
>>>
>>> The primary challenge will be writing algorithms for efficiently
>>> converting tree structures to the grid language.  Examples of  
>>> similar
>>> algorithms exist for plotting using the old base graphics interface.
>>>
>>> Involved toolkits or projects
>>>
>>> phylobase, R, C/C++, grid
>>>
>>> Mentors
>>>
>>> Thibaut?
>>>
>>> On Mar 10, 2008, at 1:51 PM, Steve Kembel wrote:
>>>
>>>> Hi all,
>>>>
>>>> Here's a Google Summer of Code 'idea'. Deadline for getting these  
>>>> up
>>>> on the wiki is today. Thoughts? Edits? Anyone else want to sign  
>>>> up to
>>>> be a mentor? Any other ideas? People suggested plotting, RUnit/
>>>> testing, linking with nexml or phyloxml...?
>>>>
>>>> Rationale
>>>>
>>>> There is a need for efficient phylogenetic tree manipulation  
>>>> methods
>>>> in the R statistical package to take advantage of the statistical
>>>> computing ability of R for bioinformatics and comparative   
>>>> phylogenetic
>>>> analyses. NESCent sponsored a hackathon focused on integration of
>>>> comparative methods within the R statistical package to promote
>>>> interoperability, the support of data exchange standards, and  
>>>> greater
>>>> usability of tools and methods in evolutionary bioinformatics. One
>>>> result of this hackathon has been the development of the phylobase
>>>> package, which seeks to provide a set of S4 classes and methods for
>>>> representing and manipulating phylogenetic trees and data in R.
>>>> Currently phylobase contains structures for representing  
>>>> phylogenetic
>>>> trees and associated data, but methods for tree manipulation remain
>>>> incomplete or have not been optimized. Current implementation of
>>>> phylogenetic tree storage and manipulation are inadequate for  
>>>> working
>>>> the large tree and multiple tree datasets that are increasingly   
>>>> common
>>>> in bioinformatics and comparative biology.
>>>>
>>>> Approach
>>>>
>>>> The R programming language, an object-oriented statistical   
>>>> programming
>>>> language, has recently introduced a new objecet-oriented class  
>>>> system
>>>> (S4). Phylogenetic trees in phylobase are currently represented  
>>>> as S4
>>>> data objects. The methods for tree manipulation are currently a
>>>> mixture of S3 and S4 methods and C/C++ extensions. The approach for
>>>> this project will be to identify obstacles to manipulating large   
>>>> trees
>>>> and datasets, which could include optimizing tree or data
>>>> representation in memory, and to develop  and implement efficient
>>>> algorithms for tree representation and manipulation using object-
>>>> oriented S4 classes and methods or C/C++ extensions.
>>>>
>>>> Challenges
>>>>
>>>> While the R statistical programming language is extremely  
>>>> powerful  and
>>>> provides a rich feature set, it is inefficient at handling very  
>>>> large
>>>> objects and heavy computational lifting (recursion, for-loops). The
>>>> general challenge for this project will be to identify data   
>>>> structures
>>>> and methods that have the greatest impact on the ability to work  
>>>> with
>>>> very large trees and datasets, and to implement these structures  
>>>> and
>>>> methods in a more efficient way. This will require profiling and
>>>> testing of existing code, the use of S4 classes and methods, and
>>>> possibly the R API and C/C++ extensions to the R language.
>>>>
>>>> Involved toolkits or projects
>>>>
>>>> phylobase, R, S4 classes
>>>>
>>>> Mentors
>>>>
>>>> Steven Kembel, ?
>>>> _______________________________________________
>>>> Phylobase-devl mailing list
>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/  
>>>> phylobase-devl
>>> _______________________________________________
>>> Phylobase-devl mailing list
>>> Phylobase-devl at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/  
>>> phylobase-devl
>
>
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl

______________________________________________
Dr. Steven Kembel - skembel at berkeley.edu
http://www.phylodiversity.net/skembel/




More information about the Phylobase-devl mailing list