[Phylobase-devl] Issues with NCL and/or NCL interface

Peter Cowan pdc at berkeley.edu
Thu Mar 11 22:37:50 CET 2010


On Mar 11, 2010, at 8:36 AM, Brian O'Meara wrote:

> 
> On Mar 10, 2010, at 9:47 PM, François Michonneau wrote:
> 
>> Hello all,
>> 
>> While writing tests for readNexus I faced a few bugs in the way data
>> included in NEXUS files are imported in phylobase. I am definitely
>> more familiar with trees than with data when it comes to NEXUS files
>> so I might have done something wrong.
>> 
>> I created another NEXUS file with Mesquite which includes
>> polymorphic characters and excluded characters (file
>> treeplucharV02.nex). I am not sure if the problems described below are
>> caused by NCL or by the interface, so it would be great if someone
>> with more knowledge could look into it.
>> 
>> Let me know if you want more details/clarifications about these  
>> issues.
>> 
>> Cheers,
>> -- François
> 
> Thanks for working on this, François.

Yes, thanks for pushing this forward.

>> 
>> 1. char.all=TRUE/FALSE (if TRUE includes even excluded characters in
>> the NEXUS file)
>> This doesn't seem to work. In the example file, the character Test3 is
>> supposed to be excluded (in the ASSUMPTIONS block), but the option has
>> no effect on the string returned by ReadCharsWithNCL. We could
>> temporarily remove this option.
> 
> 
> This is due to changes in the NCL. The way we got all vs some chars  
> (NCLInterface.cpp) is
> 
> 			if (allchar) {
> 				nchartoreturn=characters->GetNCharTotal();
> 			}
> 			else {
> 				nchartoreturn=characters->GetNChar();
> 			}
> 
> but
> 
> nxscharactersblock.h:|	The old GetNChar() function is now called  
> GetNumIncludedChars();
> 
> Changing GetNChar to GetNumIncludedChars should help (I haven't coded  
> in phylobase lately, so I don't want to start committing code, but  
> this is where I'd start looking).
> 
> 
>> 
>> 2. polymorphic.convert=TRUE/FALSE (if TRUE converts polymorphic
>> characters to missing characters)
>> 2.1. polymorphic characters
>> In this case, the string returned by ReadCharsWithNCL differ depending
>> on the option. If polymorphic.convert=TRUE, NA are returned for
>> polymorphic states. If polymorphic.convert=FALSE, then
>> ReadCharsWithNCL returns all the states using curly brackets (e.g.
>> {0,1}), which produces an error message when evaluated within R. I
>> wrote a workaround (in R) for this problem that I should be able to
>> commit tomorrow. So, at least for now, it's not a crucial issue.
> 
> Good. When writing this part of phylobase, I wanted to keep the option  
> of using polymorphic characters, though I don't think any R  
> phylogenetic packages could use this (but maybe I'm wrong). Coding  
> this to use whatever is standard in R for showing polymorphism would  
> be good.

How are you handling this on the R side?  I don't think we've really discussed polymorphic characters before, have we?  Are there any functions out there that can handle them?  Should we add some functions for checking for or removing polymorphic data?

>> 2.2. factor levels
>> Another somewhat related issue is the way the data frame based on the
>> data contained in the NEXUS file is created. Each character is treated
>> as a factor which is constructed using a call like:
>> Test1=factor(c(1,NA,1,1,0,1,0,NA,NA,1,0,1,0,1,1,NA, 
>> 0,0),levels=c(0,1,2,3),labels=c("test1A","test1B","","")
>> However, this kind of call produces warning messages because
>> duplicated labels aren't allowed anymore. The string created by
>> ReadCharsWithNCL creates unnecessary levels. The number of levels is
>> the same for all the characters in the data set. From the few tests I
>> have run, it looks that this number matches the maximum number of
>> states for a given character +1 (in the example file, only the
>> character "Test3" has 3 levels). I have also written workaround this
>> problem but there is the risk that this problem will turn into an
>> error message in the next few releases of R.
> 
> It's good to fix the problem of duplicated labels. As for having the  
> number of levels the same for all characters, regardless of how many  
> states they have, this was deliberate. For example, you might have a  
> data matrix for colors of flower parts, and use the same state coding  
> (0=red, 1=white, 2=yellow) for three different flower parts (inner  
> whorl of petals, outer whorl, stamen).

Does the NEXUS format allow one state specification for multiple characters like that? Or, does each character get its own "translation table" like the file Francios uploaded, which has this line:

1 Test1 /  test1A test1B, 2 Test2 /  test2A test2B, 3 Test3 /  test3A test3B test3C ;
 
> If the first two parts are any  
> of the three colors, and stamens are only red (0) or yellow (2), you  
> don't want to recode it so that the 0 and 2 for the stamen in nexus  
> become a 0 and 1 in R. This would make plants that have yellow petals  
> and stamens (222) be recoded as having white stamens (221), which  
> could affect later analyses.

Do you mean later analyses in R, or outside of R?  Once they are imported into R users shouldn't try to refer to the underlying factor levels, but use the labels that we associate with the character for them.

> If you do want this recoding so that  
> characters with two states only have two levels, you could use  
> levels.uniform=FALSE. levels.uniform=TRUE is the default because this  
> is how most people code traits.

Interesting, but I'm a bit confused (probably because I've never coded traits).  If I'm making a character matrix for a flower with two traits (pubescent leaves and flower color), both traits could have states TRUE, FALSE, RED, and WHITE? Even though the first two states are only ever associated with the first trait, and like wise for the second?

> Hope this helps,

It's helping me, thanks!

Peter

> Brian
> 
> 
>> _______________________________________________
>> Phylobase-devl mailing list
>> Phylobase-devl at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
> 
> ------------------------------------------------------
> Brian O'Meara
> http://www.brianomeara.info
> Assistant Prof.
> Dept. Ecology & Evolutionary Biology
> U. of Tennessee, Knoxville
> 
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl



More information about the Phylobase-devl mailing list