[Phylobase-devl] Issues with NCL and/or NCL interface

Brian O'Meara omeara.brian at gmail.com
Thu Mar 11 17:36:09 CET 2010


On Mar 10, 2010, at 9:47 PM, François Michonneau wrote:

> Hello all,
>
>  While writing tests for readNexus I faced a few bugs in the way data
> included in NEXUS files are imported in phylobase. I am definitely
> more familiar with trees than with data when it comes to NEXUS files
> so I might have done something wrong.
>
>  I created another NEXUS file with Mesquite which includes
> polymorphic characters and excluded characters (file
> treeplucharV02.nex). I am not sure if the problems described below are
> caused by NCL or by the interface, so it would be great if someone
> with more knowledge could look into it.
>
>  Let me know if you want more details/clarifications about these  
> issues.
>
>  Cheers,
>  -- François

Thanks for working on this, François.

>
> 1. char.all=TRUE/FALSE (if TRUE includes even excluded characters in
> the NEXUS file)
> This doesn't seem to work. In the example file, the character Test3 is
> supposed to be excluded (in the ASSUMPTIONS block), but the option has
> no effect on the string returned by ReadCharsWithNCL. We could
> temporarily remove this option.


This is due to changes in the NCL. The way we got all vs some chars  
(NCLInterface.cpp) is

			if (allchar) {
				nchartoreturn=characters->GetNCharTotal();
			}
			else {
				nchartoreturn=characters->GetNChar();
			}

but

nxscharactersblock.h:|	The old GetNChar() function is now called  
GetNumIncludedChars();

Changing GetNChar to GetNumIncludedChars should help (I haven't coded  
in phylobase lately, so I don't want to start committing code, but  
this is where I'd start looking).


>
> 2. polymorphic.convert=TRUE/FALSE (if TRUE converts polymorphic
> characters to missing characters)
> 2.1. polymorphic characters
> In this case, the string returned by ReadCharsWithNCL differ depending
> on the option. If polymorphic.convert=TRUE, NA are returned for
> polymorphic states. If polymorphic.convert=FALSE, then
> ReadCharsWithNCL returns all the states using curly brackets (e.g.
> {0,1}), which produces an error message when evaluated within R. I
> wrote a workaround (in R) for this problem that I should be able to
> commit tomorrow. So, at least for now, it's not a crucial issue.

Good. When writing this part of phylobase, I wanted to keep the option  
of using polymorphic characters, though I don't think any R  
phylogenetic packages could use this (but maybe I'm wrong). Coding  
this to use whatever is standard in R for showing polymorphism would  
be good.

> 2.2. factor levels
> Another somewhat related issue is the way the data frame based on the
> data contained in the NEXUS file is created. Each character is treated
> as a factor which is constructed using a call like:
> Test1=factor(c(1,NA,1,1,0,1,0,NA,NA,1,0,1,0,1,1,NA, 
> 0,0),levels=c(0,1,2,3),labels=c("test1A","test1B","","")
> However, this kind of call produces warning messages because
> duplicated labels aren't allowed anymore. The string created by
> ReadCharsWithNCL creates unnecessary levels. The number of levels is
> the same for all the characters in the data set. From the few tests I
> have run, it looks that this number matches the maximum number of
> states for a given character +1 (in the example file, only the
> character "Test3" has 3 levels). I have also written workaround this
> problem but there is the risk that this problem will turn into an
> error message in the next few releases of R.

It's good to fix the problem of duplicated labels. As for having the  
number of levels the same for all characters, regardless of how many  
states they have, this was deliberate. For example, you might have a  
data matrix for colors of flower parts, and use the same state coding  
(0=red, 1=white, 2=yellow) for three different flower parts (inner  
whorl of petals, outer whorl, stamen). If the first two parts are any  
of the three colors, and stamens are only red (0) or yellow (2), you  
don't want to recode it so that the 0 and 2 for the stamen in nexus  
become a 0 and 1 in R. This would make plants that have yellow petals  
and stamens (222) be recoded as having white stamens (221), which  
could affect later analyses. If you do want this recoding so that  
characters with two states only have two levels, you could use  
levels.uniform=FALSE. levels.uniform=TRUE is the default because this  
is how most people code traits.

Hope this helps,
Brian


> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl

------------------------------------------------------
Brian O'Meara
http://www.brianomeara.info
Assistant Prof.
Dept. Ecology & Evolutionary Biology
U. of Tennessee, Knoxville



More information about the Phylobase-devl mailing list