[Phylobase-devl] Phylobase GSoC idea
Peter Cowan
pdc at berkeley.edu
Tue Mar 11 01:38:42 CET 2008
Here is another idea, this one in need of a mentor. Thibaut, Ben?
Peter
Rationale
Methods for visualizing phylogenetic trees and associated data have
not kept pace with growth in tree or dataset size. At a recent
NESCent hackathon a new R package for comparative methods, phylobase,
was designed. The phylobase package uses modern R classes and methods
to store and manipulate phylogenetic trees. Currently plotting of
phylogenetic trees in phylobase relies on the R base graphics, and
functions in other packages. These implementations are not well
suited to displaying either large phylogenetic trees or trees with
large amounts of associated data.
Approach
The R programming has two primary plot device interfaces, the base
graphics interface and the newer, more extensible grid system. The
grid system allows for a much more flexible system which will allow
for consistent scaling and resizing of trees and data. Current plot
methods in phylobase are based on base graphics and suffer from
resizing and layout difficulties. This project will develop new plot
methods based on the grid system.
Challenges
The primary challenge will be writing algorithms for efficiently
converting tree structures to the grid language. Examples of similar
algorithms exist for plotting using the old base graphics interface.
Involved toolkits or projects
phylobase, R, C/C++, grid
Mentors
Thibaut?
On Mar 10, 2008, at 1:51 PM, Steve Kembel wrote:
> Hi all,
>
> Here's a Google Summer of Code 'idea'. Deadline for getting these up
> on the wiki is today. Thoughts? Edits? Anyone else want to sign up to
> be a mentor? Any other ideas? People suggested plotting, RUnit/
> testing, linking with nexml or phyloxml...?
>
> Rationale
>
> There is a need for efficient phylogenetic tree manipulation methods
> in the R statistical package to take advantage of the statistical
> computing ability of R for bioinformatics and comparative phylogenetic
> analyses. NESCent sponsored a hackathon focused on integration of
> comparative methods within the R statistical package to promote
> interoperability, the support of data exchange standards, and greater
> usability of tools and methods in evolutionary bioinformatics. One
> result of this hackathon has been the development of the phylobase
> package, which seeks to provide a set of S4 classes and methods for
> representing and manipulating phylogenetic trees and data in R.
> Currently phylobase contains structures for representing phylogenetic
> trees and associated data, but methods for tree manipulation remain
> incomplete or have not been optimized. Current implementation of
> phylogenetic tree storage and manipulation are inadequate for working
> the large tree and multiple tree datasets that are increasingly common
> in bioinformatics and comparative biology.
>
> Approach
>
> The R programming language, an object-oriented statistical programming
> language, has recently introduced a new objecet-oriented class system
> (S4). Phylogenetic trees in phylobase are currently represented as S4
> data objects. The methods for tree manipulation are currently a
> mixture of S3 and S4 methods and C/C++ extensions. The approach for
> this project will be to identify obstacles to manipulating large trees
> and datasets, which could include optimizing tree or data
> representation in memory, and to develop and implement efficient
> algorithms for tree representation and manipulation using object-
> oriented S4 classes and methods or C/C++ extensions.
>
> Challenges
>
> While the R statistical programming language is extremely powerful and
> provides a rich feature set, it is inefficient at handling very large
> objects and heavy computational lifting (recursion, for-loops). The
> general challenge for this project will be to identify data structures
> and methods that have the greatest impact on the ability to work with
> very large trees and datasets, and to implement these structures and
> methods in a more efficient way. This will require profiling and
> testing of existing code, the use of S4 classes and methods, and
> possibly the R API and C/C++ extensions to the R language.
>
> Involved toolkits or projects
>
> phylobase, R, S4 classes
>
> Mentors
>
> Steven Kembel, ?
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
More information about the Phylobase-devl
mailing list