[Phylobase-devl] Phylobase GSoC idea

Peter Cowan pdc at berkeley.edu
Tue Mar 11 01:38:42 CET 2008


Here is another idea, this one in need of a mentor.  Thibaut, Ben?

Peter

Rationale

Methods for visualizing phylogenetic trees and associated data have  
not kept pace with growth in tree or dataset size.  At a recent  
NESCent hackathon a new R package for comparative methods, phylobase,  
was designed.  The phylobase package uses modern R classes and methods  
to store and manipulate phylogenetic trees.  Currently plotting of  
phylogenetic trees in phylobase relies on the R base graphics, and  
functions in other packages.  These implementations are not well  
suited to displaying either large phylogenetic trees or trees with  
large amounts of associated data.

Approach

The R programming has two primary plot device interfaces, the base  
graphics interface and the newer, more extensible grid system.  The  
grid system allows for a much more flexible system which will allow  
for consistent scaling and resizing of trees and data.  Current plot  
methods in phylobase are based on base graphics and suffer from  
resizing and layout difficulties.  This project will develop new plot  
methods based on the grid system.

Challenges

The primary challenge will be writing algorithms for efficiently  
converting tree structures to the grid language.  Examples of similar  
algorithms exist for plotting using the old base graphics interface.

Involved toolkits or projects

phylobase, R, C/C++, grid

Mentors

Thibaut?

On Mar 10, 2008, at 1:51 PM, Steve Kembel wrote:

> Hi all,
>
> Here's a Google Summer of Code 'idea'. Deadline for getting these up
> on the wiki is today. Thoughts? Edits? Anyone else want to sign up to
> be a mentor? Any other ideas? People suggested plotting, RUnit/
> testing, linking with nexml or phyloxml...?
>
> Rationale
>
> There is a need for efficient phylogenetic tree manipulation methods
> in the R statistical package to take advantage of the statistical
> computing ability of R for bioinformatics and comparative phylogenetic
> analyses. NESCent sponsored a hackathon focused on integration of
> comparative methods within the R statistical package to promote
> interoperability, the support of data exchange standards, and greater
> usability of tools and methods in evolutionary bioinformatics. One
> result of this hackathon has been the development of the phylobase
> package, which seeks to provide a set of S4 classes and methods for
> representing and manipulating phylogenetic trees and data in R.
> Currently phylobase contains structures for representing phylogenetic
> trees and associated data, but methods for tree manipulation remain
> incomplete or have not been optimized. Current implementation of
> phylogenetic tree storage and manipulation are inadequate for working
> the large tree and multiple tree datasets that are increasingly common
> in bioinformatics and comparative biology.
>
> Approach
>
> The R programming language, an object-oriented statistical programming
> language, has recently introduced a new objecet-oriented class system
> (S4). Phylogenetic trees in phylobase are currently represented as S4
> data objects. The methods for tree manipulation are currently a
> mixture of S3 and S4 methods and C/C++ extensions. The approach for
> this project will be to identify obstacles to manipulating large trees
> and datasets, which could include optimizing tree or data
> representation in memory, and to develop  and implement efficient
> algorithms for tree representation and manipulation using object-
> oriented S4 classes and methods or C/C++ extensions.
>
> Challenges
>
> While the R statistical programming language is extremely powerful and
> provides a rich feature set, it is inefficient at handling very large
> objects and heavy computational lifting (recursion, for-loops). The
> general challenge for this project will be to identify data structures
> and methods that have the greatest impact on the ability to work with
> very large trees and datasets, and to implement these structures and
> methods in a more efficient way. This will require profiling and
> testing of existing code, the use of S4 classes and methods, and
> possibly the R API and C/C++ extensions to the R language.
>
> Involved toolkits or projects
>
> phylobase, R, S4 classes
>
> Mentors
>
> Steven Kembel, ?
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl



More information about the Phylobase-devl mailing list