[Phylobase-devl] Phylobase GSoC idea

Hilmar Lapp hlapp at duke.edu
Mon Mar 10 23:50:45 CET 2008


Hi guys,

great to see you come up with a project. My feedback would be to  
lower the emphasis on the 'identifying the obstacles' and  
'identifying the data structures that need optimization'; these sound  
much like a (possibly open) research project.

The Summer of Code is meant to be a software development project, not  
a research project. There may be some research components,  
particularly for algorithm-y projects, but those would center on  
*how* to implement something, not *what*. The what (or the choices to  
draw from or to propose a variation on) should be clear from the outset.

So, for the below I would suggest to name one or two or a few things  
to start with (make your best bets, and you can still fine-tune it  
after we are accepted - if we are). You could then say that as time  
permits, further targets for optimization will be determined (but say  
how, e.g., through profiling, where to find datasets, etc).

Does this make sense?

	-hilmar

On Mar 10, 2008, at 4:51 PM, Steve Kembel wrote:

> Hi all,
>
> Here's a Google Summer of Code 'idea'. Deadline for getting these up
> on the wiki is today. Thoughts? Edits? Anyone else want to sign up to
> be a mentor? Any other ideas? People suggested plotting, RUnit/
> testing, linking with nexml or phyloxml...?
>
> Rationale
>
> There is a need for efficient phylogenetic tree manipulation methods
> in the R statistical package to take advantage of the statistical
> computing ability of R for bioinformatics and comparative phylogenetic
> analyses. NESCent sponsored a hackathon focused on integration of
> comparative methods within the R statistical package to promote
> interoperability, the support of data exchange standards, and greater
> usability of tools and methods in evolutionary bioinformatics. One
> result of this hackathon has been the development of the phylobase
> package, which seeks to provide a set of S4 classes and methods for
> representing and manipulating phylogenetic trees and data in R.
> Currently phylobase contains structures for representing phylogenetic
> trees and associated data, but methods for tree manipulation remain
> incomplete or have not been optimized. Current implementation of
> phylogenetic tree storage and manipulation are inadequate for working
> the large tree and multiple tree datasets that are increasingly common
> in bioinformatics and comparative biology.
>
> Approach
>
> The R programming language, an object-oriented statistical programming
> language, has recently introduced a new objecet-oriented class system
> (S4). Phylogenetic trees in phylobase are currently represented as S4
> data objects. The methods for tree manipulation are currently a
> mixture of S3 and S4 methods and C/C++ extensions. The approach for
> this project will be to identify obstacles to manipulating large trees
> and datasets, which could include optimizing tree or data
> representation in memory, and to develop  and implement efficient
> algorithms for tree representation and manipulation using object-
> oriented S4 classes and methods or C/C++ extensions.
>
> Challenges
>
> While the R statistical programming language is extremely powerful and
> provides a rich feature set, it is inefficient at handling very large
> objects and heavy computational lifting (recursion, for-loops). The
> general challenge for this project will be to identify data structures
> and methods that have the greatest impact on the ability to work with
> very large trees and datasets, and to implement these structures and
> methods in a more efficient way. This will require profiling and
> testing of existing code, the use of S4 classes and methods, and
> possibly the R API and C/C++ extensions to the R language.
>
> Involved toolkits or projects
>
> phylobase, R, S4 classes
>
> Mentors
>
> Steven Kembel, ?
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
> phylobase-devl

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================






More information about the Phylobase-devl mailing list