[Phylobase-devl] Phylobase GSoC idea

Tue Mar 11 05:47:07 CET 2008

If you will allow me to do some (mostly minor) editing once it's on  
the page, this sounds good. (since it's a wiki, you'll be able to  
revert my edits easily :-)

One thing I should note though, we *must* have at least one mentor  
for the project, and it can't be the same as for the other one that's  
up. It's fine if you reshuffle mentors when we are accepted, and I'll  
then (after acceptance) also pester you to drum up at least one co- 
mentor.

The amount of time a co-mentor should expect to be spending on  
mentoring the student depends almost entirely on the arrangement with  
the primary mentor. This can range from dividing the work evenly at  
all times, to the co-mentor only needing to step in (but then  
possibly fully) if the primary mentor gets sick or is on vacation (or  
gets overwhelmed due to deadlines or whatever). (This also tells you  
why having a co-mentor is highly desirable.) Co-mentors may be on  
more than one project.

	-hilmar

On Mar 11, 2008, at 12:29 AM, Steve Kembel wrote:

> Hi all,
>
>>   Hmmm.  Maybe I could co-mentor ... (5 hours/week sounds like
>> a lot ...)
>
> It does sound like a serious commitment but also potentially very
> valuable. And the shirt is tempting. :) Sharing mentoring
> responsibilities for a single project is more realistic than having
> several projects going. Peter was interested in mentoring but he'd
> also be eligible to apply to GSoC as a student and can't do both. I'd
> be happy to co-mentor if I had the right skill set for the project (R/
> C/C++ all ok).
>
>>  Brian did mention that there were some existing C++ libraries
>> for tree manipulation etc. ... patching into these might be
>> the (an?) answer?
>
> I rewrote the idea I proposed before to open it up to a wider range of
> potential projects, from tree manipulation to multi-tree, metadata or
> even buildiing an interface with nexml or nexus (i.e. more work on the
> ioNCL code). See below. Too vague now?
>
> It did sound like there was more interest in the plotting idea if we
> were to go with just one phylobase-related project.
>
> Steve
>
> ---
> Rationale
>
> NESCent sponsored a hackathon focused on integration of comparative
> methods within the R statistical package to promote interoperability,
> the support of data exchange standards, and greater usability of tools
> and methods in evolutionary bioinformatics. One result of this
> hackathon has been the development of the phylobase package, which
> seeks to provide a set of S4 classes and methods for representing and
> manipulating phylogenetic trees and associated data in R. Phylobase
> contains structures for representing phylogenetic trees and associated
> data, but methods for tree manipulation, representation of multiple
> trees and metadata, and interfaces with other data formats (i.e.
> nexus, nexml) remain incomplete or have not been optimized for use
> with the large, multi-tree datasets that are increasingly common in
> bioinformatics and comparative biology.
>
> Approach
>
> Phylogenetic trees and associated data in phylobase are represented as
> S4 data objects. The methods for tree/data manipulation and import are
> currently a mixture of S3 and S4 methods and C/C++ extensions. The
> approach for this project will be to implement efficient algorithms
> for tree and data representation and manipulation using object-
> oriented S4 classes and methods, or C/C++ extensions where necessary
> for performance. We would suggest focusing on methods such as tree
> pruning, subsetting, and manipulation of multiple tree objects that
> are currently incomplete and will have the greatest impact on the
> ability to work with very large trees and datasets. It would also be
> useful to improve interfaces with other data formats such as nexus and
> nexml that will be the likely source for import of trees, data and
> metadata.
>
> Challenges
>
> While the R statistical programming language is extremely powerful and
> provides a rich feature set, it is inefficient at handling very large
> objects and heavy computational lifting (recursion, for-loops). The
> general challenge for this project will be to optimize the data
> structures (trees, multi-trees, associated data, metadata) and methods
> (pruning, subsetting of trees and data) that have the greatest impact
> on the ability to work with very large trees and datasets. This will
> require profiling and testing of existing code, implementing existing
> algorithms using S4 classes and methods and C/C++ extensions, or
> writing interfaces with data formats such as nexus and nexml.
>
> Involved toolkits or projects
>
> phylobase, R, C/C++, nexus class library, nexml
>
> Mentors
> ???
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
> phylobase-devl

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================