[Phylobase-devl] New phylobase build approach using static libncl (Was: Rcpp and OS X compiliation)

Peter Cowan pdc at berkeley.edu
Wed Mar 3 22:51:19 CET 2010


Sending again to all now.


On Mar 2, 2010, at 1:46 PM, Mark Holder wrote:

> Hi again,
> 
> Mark Holder and then Peter Cowan wrote:
>>> I should be able to fix that aspect of things tomorrow.  When I push those changes to NCL, I'll post to this list so that someone with write permissions to phylobase's svn repo can copy those changes into the phylobase.
>> 
>> I'm happy to do this.
> 
> 
> At http://people.ku.edu/~mtholder/mth_diff_pkg.tar.gz you'll find a tar.gz archive.  If you unpack it, you'll see the contents of phylobase's pkg directory with the edits that I had to make to get things working. Several files were touched (the archive has all of the pkg if I run svn stat I see:
> 
> M       pkg/src/NCLInterface.h
> M       pkg/src/ReadWithNCL.cpp
> M       pkg/src/nxspublicblocks.cpp
> M       pkg/src/nxscharactersblock.cpp
> M       pkg/src/ncl/nxsreader.h
> M       pkg/src/nxsreader.cpp
> M       pkg/src/NCLInterface.cpp
> M       pkg/inst/nexusfiles/treepluscharV01.nex

Thanks a ton, this would have me ages to do!  And it looks like Francois has already gotten this checked in.

[snip]

> Also note that checkTree(object) in  checkdata.R is complaining about the data returned from pkg/inst/nexusfiles/co1.nex, but I don't know why.  It looks OK to me.  It is possible that I broke something, as I'm not too familiar with what was supposed to be returned by NCL (I tried to return the same syntax that was originally being returned).

As Francois mentioned this is due to our restrictions on node labels, the work around is simple and I'll add it to the docs.


>>>> Lastly, is there a way to control the output though Rcpp or otherwise, a fair bit of what appears to be stderr gets printed in the R console, it'd be nice to control this.
>>> 
>>> The BASICCMDLINE implementation writes lots of status output to the standard error stream.  When I work with it tomorrow, I can make its chattiness controllable by an argument.
>> 
>> That would be wonderful!
> 
> I'm partially there. For the NCLInterface.cpp code used by phylobase I changed it so that it is quiet by default.  It is now be possible to send a numeric argument into BASICCMDLINE.Initialize();
> 
> Currently (in ReadWithNCL.cpp) it says,
> 	reader.Initialize(const_cast < char* > (filename.c_str()));
> 
> If you change it to
> 	reader.Initialize(const_cast < char* > (filename.c_str()), 0);
> 
> you'll see more messages to stderr.  The numbers 0-7 generate less and less output.  8 or higher should be silent (which is the default).
> 
> 
> I did not add the hooks in ReadWithNCL.cpp to get a verbosity argument from R and pass that along to NCL.
> 

Great!  This might be within my C++ abilities so I'll see if I can at least add a T/F verbosity parameter to the function.

>> If you have time, I'd like to have a more in depth conversation of what NCL is capable and how it is organized.  Wrapping NCL is, in my opinion, one of the key features of phylobase.  But, I also get the feeling that there is more functionality we aren't taking advantage of.
>> 
>> Perhaps a GSOC student project could come out of it.  Or at the very least I can expand the developers guide section about NCL so we are better able to maintain it in the future.
> 
> 
> The NCL docs are terribly out of date. Completely my fault. Paul Lewis is great about documenting his code, but I have not kept up with the documenting tasks.  I'm happy to chat.
> 
> The main potential problems that I see with the ways that phylobase is using NCL now are:
> 	1. in NCLInterface.cpp there are lots of call to RemoveUnderscoresAndSpaces to get rid of spaces and _ in names.  That makes names easier to deal with, but at some point will bite you (somebody will have dataset with a taxon labelled "AB" and another with "A B", after transformation there will be a name clash).

I agree that this is something to address.  Not only might there be clashes but changing names, will be annoying to users.  Brian or Derrick could answer better, but I assume this is because some of the code used to parse the tree string can't handle the underscores and spaces.

Which brings me to one of the questions I've had about NCL.  What are the export options for trees.  Does NCL parse the tree block and have an internal storage that we could convert more directly into our tree format?   

Currently I think a tree string (essentially newick?) is passed back to the R code which parses it with regular expressions.  The RegEx code is lifted directly from APE and is complicated and somewhat fragile.

> 	2. It was not clear to me how different character blocks should separated.  I just return the union of all character matrices.  It seems like, phylobase will need a richer interaction with NCL if phylobase wants to know about whether the data came in under different blocks.

If I understand correctly, this is multiple sets of data associated with the same tree?  E.g. a character block containing morphological data and another with DNA data?

I think the union is the best approach for the time being.  There is not a good way for supporting multiple datasets with phylobase at the moment.  If this becomes a common feature request we'll figure something out.  I think more likely is the case where only certain character blocks are desired, such as, reading the morph but not DNA data.

Thanks!

Peter

> 
> 
> all the best,
> Mark
> 
> 
> 
> 
> 
> Mark Holder
> 
> mtholder at ku.edu
> http://phylo.bio.ku.edu/mark-holder
> 
> ==============================================
> Department of Ecology and Evolutionary Biology
> University of Kansas
> 6031 Haworth Hall
> 1200 Sunnyside Avenue
> Lawrence, Kansas 66045
> 
> lab phone:  785.864.5789
> 
> fax (shared): 785.864.5860
> ==============================================
> 
> 
> 
> 
> 
> 
> 
> 



More information about the Phylobase-devl mailing list