[Phylobase-devl] Phylobase GSoC idea

Ben Bolker bolker at zoo.ufl.edu
Tue Mar 11 04:55:48 CET 2008


    Hmmm.  Maybe I could co-mentor ... (5 hours/week sounds like
  a lot ...)

   Brian did mention that there were some existing C++ libraries
for tree manipulation etc. ... patching into these might be
the (an?) answer?

   Ben

Steve Kembel wrote:
> Hi Ben,
> 
> Sounds like the mentoring time commitment is around 5 hours a week for 
> 12 weeks, no time on site - all done remotely:
> http://code.google.com/soc/2008/faqs.html
> 
> I've started to rework the idea I proposed earlier and am wondering if 
> this is something people think is worthwhile - off the top of my head 
> the specific things I was thinking need optimization/work would be 
> pruning, subsetting, finishing the multiPhylo4 implementation... none 
> very glamourous (no acronyms involved!) but necessary, and most are 
> currently either unfinished or we're using the ape code, so every time 
> we do it we're converting phylo4 -> ape -> phylo4.
> 
> Another option related to the ioNCL stuff and phylo4d would be finishing 
> the pdata implementation - i.e. tree + data + metadata, or even linking 
> the pdata to some data standard like nexml or phyloxml (which I know 
> very little about). Don't know if all of this could be crammed into a 
> single idea or not.
> 
> Steve
> 
> On Mar 10, 2008, at 8:39 PM, Ben Bolker wrote:
> 
>>  Hear, hear.
>>
>>  It would also be fun to figure out if this could be
>> integrated into the "grammar of graphics" as developed
>> for R by Hadley Wickham.  I wasn't paying attention -- how
>> much work is mentoring, and how much time has to be
>> spent on-site?
>>
>>  Ben
>>
>>
>> Hilmar Lapp wrote:
>>> This sounds great! -hilmar
>>> On Mar 10, 2008, at 8:38 PM, Peter Cowan wrote:
>>>> Here is another idea, this one in need of a mentor.  Thibaut, Ben?
>>>>
>>>> Peter
>>>>
>>>> Rationale
>>>>
>>>> Methods for visualizing phylogenetic trees and associated data have
>>>> not kept pace with growth in tree or dataset size.  At a recent
>>>> NESCent hackathon a new R package for comparative methods, phylobase,
>>>> was designed.  The phylobase package uses modern R classes and methods
>>>> to store and manipulate phylogenetic trees.  Currently plotting of
>>>> phylogenetic trees in phylobase relies on the R base graphics, and
>>>> functions in other packages.  These implementations are not well
>>>> suited to displaying either large phylogenetic trees or trees with
>>>> large amounts of associated data.
>>>>
>>>> Approach
>>>>
>>>> The R programming has two primary plot device interfaces, the base
>>>> graphics interface and the newer, more extensible grid system.  The
>>>> grid system allows for a much more flexible system which will allow
>>>> for consistent scaling and resizing of trees and data.  Current plot
>>>> methods in phylobase are based on base graphics and suffer from
>>>> resizing and layout difficulties.  This project will develop new plot
>>>> methods based on the grid system.
>>>>
>>>> Challenges
>>>>
>>>> The primary challenge will be writing algorithms for efficiently
>>>> converting tree structures to the grid language.  Examples of similar
>>>> algorithms exist for plotting using the old base graphics interface.
>>>>
>>>> Involved toolkits or projects
>>>>
>>>> phylobase, R, C/C++, grid
>>>>
>>>> Mentors
>>>>
>>>> Thibaut?
>>>>
>>>> On Mar 10, 2008, at 1:51 PM, Steve Kembel wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Here's a Google Summer of Code 'idea'. Deadline for getting these up
>>>>> on the wiki is today. Thoughts? Edits? Anyone else want to sign up to
>>>>> be a mentor? Any other ideas? People suggested plotting, RUnit/
>>>>> testing, linking with nexml or phyloxml...?
>>>>>
>>>>> Rationale
>>>>>
>>>>> There is a need for efficient phylogenetic tree manipulation methods
>>>>> in the R statistical package to take advantage of the statistical
>>>>> computing ability of R for bioinformatics and comparative  
>>>>> phylogenetic
>>>>> analyses. NESCent sponsored a hackathon focused on integration of
>>>>> comparative methods within the R statistical package to promote
>>>>> interoperability, the support of data exchange standards, and greater
>>>>> usability of tools and methods in evolutionary bioinformatics. One
>>>>> result of this hackathon has been the development of the phylobase
>>>>> package, which seeks to provide a set of S4 classes and methods for
>>>>> representing and manipulating phylogenetic trees and data in R.
>>>>> Currently phylobase contains structures for representing phylogenetic
>>>>> trees and associated data, but methods for tree manipulation remain
>>>>> incomplete or have not been optimized. Current implementation of
>>>>> phylogenetic tree storage and manipulation are inadequate for working
>>>>> the large tree and multiple tree datasets that are increasingly  
>>>>> common
>>>>> in bioinformatics and comparative biology.
>>>>>
>>>>> Approach
>>>>>
>>>>> The R programming language, an object-oriented statistical  
>>>>> programming
>>>>> language, has recently introduced a new objecet-oriented class system
>>>>> (S4). Phylogenetic trees in phylobase are currently represented as S4
>>>>> data objects. The methods for tree manipulation are currently a
>>>>> mixture of S3 and S4 methods and C/C++ extensions. The approach for
>>>>> this project will be to identify obstacles to manipulating large  
>>>>> trees
>>>>> and datasets, which could include optimizing tree or data
>>>>> representation in memory, and to develop  and implement efficient
>>>>> algorithms for tree representation and manipulation using object-
>>>>> oriented S4 classes and methods or C/C++ extensions.
>>>>>
>>>>> Challenges
>>>>>
>>>>> While the R statistical programming language is extremely powerful  
>>>>> and
>>>>> provides a rich feature set, it is inefficient at handling very large
>>>>> objects and heavy computational lifting (recursion, for-loops). The
>>>>> general challenge for this project will be to identify data  
>>>>> structures
>>>>> and methods that have the greatest impact on the ability to work with
>>>>> very large trees and datasets, and to implement these structures and
>>>>> methods in a more efficient way. This will require profiling and
>>>>> testing of existing code, the use of S4 classes and methods, and
>>>>> possibly the R API and C/C++ extensions to the R language.
>>>>>
>>>>> Involved toolkits or projects
>>>>>
>>>>> phylobase, R, S4 classes
>>>>>
>>>>> Mentors
>>>>>
>>>>> Steven Kembel, ?
>>>>> _______________________________________________
>>>>> Phylobase-devl mailing list
>>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>>>>> phylobase-devl
>>>> _______________________________________________
>>>> Phylobase-devl mailing list
>>>> Phylobase-devl at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/ 
>>>> phylobase-devl
>>
>>
>> _______________________________________________
>> Phylobase-devl mailing list
>> Phylobase-devl at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl 
>>
> 
> ______________________________________________
> Dr. Steven Kembel - skembel at berkeley.edu
> http://www.phylodiversity.net/skembel/
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20080310/dce0c66e/attachment.pgp 


More information about the Phylobase-devl mailing list