[Phylobase-devl] Phylobase GSoC idea

Ben Bolker bolker at zoo.ufl.edu
Tue Mar 11 04:39:33 CET 2008


   Hear, hear.

   It would also be fun to figure out if this could be
integrated into the "grammar of graphics" as developed
for R by Hadley Wickham.  I wasn't paying attention -- how
much work is mentoring, and how much time has to be
spent on-site?

   Ben


Hilmar Lapp wrote:
> This sounds great! -hilmar
> 
> On Mar 10, 2008, at 8:38 PM, Peter Cowan wrote:
> 
>> Here is another idea, this one in need of a mentor.  Thibaut, Ben?
>>
>> Peter
>>
>> Rationale
>>
>> Methods for visualizing phylogenetic trees and associated data have
>> not kept pace with growth in tree or dataset size.  At a recent
>> NESCent hackathon a new R package for comparative methods, phylobase,
>> was designed.  The phylobase package uses modern R classes and methods
>> to store and manipulate phylogenetic trees.  Currently plotting of
>> phylogenetic trees in phylobase relies on the R base graphics, and
>> functions in other packages.  These implementations are not well
>> suited to displaying either large phylogenetic trees or trees with
>> large amounts of associated data.
>>
>> Approach
>>
>> The R programming has two primary plot device interfaces, the base
>> graphics interface and the newer, more extensible grid system.  The
>> grid system allows for a much more flexible system which will allow
>> for consistent scaling and resizing of trees and data.  Current plot
>> methods in phylobase are based on base graphics and suffer from
>> resizing and layout difficulties.  This project will develop new plot
>> methods based on the grid system.
>>
>> Challenges
>>
>> The primary challenge will be writing algorithms for efficiently
>> converting tree structures to the grid language.  Examples of similar
>> algorithms exist for plotting using the old base graphics interface.
>>
>> Involved toolkits or projects
>>
>> phylobase, R, C/C++, grid
>>
>> Mentors
>>
>> Thibaut?
>>
>> On Mar 10, 2008, at 1:51 PM, Steve Kembel wrote:
>>
>>> Hi all,
>>>
>>> Here's a Google Summer of Code 'idea'. Deadline for getting these up
>>> on the wiki is today. Thoughts? Edits? Anyone else want to sign up to
>>> be a mentor? Any other ideas? People suggested plotting, RUnit/
>>> testing, linking with nexml or phyloxml...?
>>>
>>> Rationale
>>>
>>> There is a need for efficient phylogenetic tree manipulation methods
>>> in the R statistical package to take advantage of the statistical
>>> computing ability of R for bioinformatics and comparative  
>>> phylogenetic
>>> analyses. NESCent sponsored a hackathon focused on integration of
>>> comparative methods within the R statistical package to promote
>>> interoperability, the support of data exchange standards, and greater
>>> usability of tools and methods in evolutionary bioinformatics. One
>>> result of this hackathon has been the development of the phylobase
>>> package, which seeks to provide a set of S4 classes and methods for
>>> representing and manipulating phylogenetic trees and data in R.
>>> Currently phylobase contains structures for representing phylogenetic
>>> trees and associated data, but methods for tree manipulation remain
>>> incomplete or have not been optimized. Current implementation of
>>> phylogenetic tree storage and manipulation are inadequate for working
>>> the large tree and multiple tree datasets that are increasingly  
>>> common
>>> in bioinformatics and comparative biology.
>>>
>>> Approach
>>>
>>> The R programming language, an object-oriented statistical  
>>> programming
>>> language, has recently introduced a new objecet-oriented class system
>>> (S4). Phylogenetic trees in phylobase are currently represented as S4
>>> data objects. The methods for tree manipulation are currently a
>>> mixture of S3 and S4 methods and C/C++ extensions. The approach for
>>> this project will be to identify obstacles to manipulating large  
>>> trees
>>> and datasets, which could include optimizing tree or data
>>> representation in memory, and to develop  and implement efficient
>>> algorithms for tree representation and manipulation using object-
>>> oriented S4 classes and methods or C/C++ extensions.
>>>
>>> Challenges
>>>
>>> While the R statistical programming language is extremely powerful  
>>> and
>>> provides a rich feature set, it is inefficient at handling very large
>>> objects and heavy computational lifting (recursion, for-loops). The
>>> general challenge for this project will be to identify data  
>>> structures
>>> and methods that have the greatest impact on the ability to work with
>>> very large trees and datasets, and to implement these structures and
>>> methods in a more efficient way. This will require profiling and
>>> testing of existing code, the use of S4 classes and methods, and
>>> possibly the R API and C/C++ extensions to the R language.
>>>
>>> Involved toolkits or projects
>>>
>>> phylobase, R, S4 classes
>>>
>>> Mentors
>>>
>>> Steven Kembel, ?
>>> _______________________________________________
>>> Phylobase-devl mailing list
>>> Phylobase-devl at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>>> phylobase-devl
>> _______________________________________________
>> Phylobase-devl mailing list
>> Phylobase-devl at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>> phylobase-devl
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20080310/64e27c29/attachment-0001.pgp 


More information about the Phylobase-devl mailing list