[Phylobase-devl] Issues with NCL and/or NCL interface

Thu Mar 11 17:38:35 CET 2010

Thanks for these comments Brian.

I'll look into it and I'll let you all if this works.

  Cheers,
  -- François

On Thu, Mar 11, 2010 at 11:36, Brian O'Meara <omeara.brian at gmail.com> wrote:
>
> On Mar 10, 2010, at 9:47 PM, François Michonneau wrote:
>
>> Hello all,
>>
>>  While writing tests for readNexus I faced a few bugs in the way data
>> included in NEXUS files are imported in phylobase. I am definitely
>> more familiar with trees than with data when it comes to NEXUS files
>> so I might have done something wrong.
>>
>>  I created another NEXUS file with Mesquite which includes
>> polymorphic characters and excluded characters (file
>> treeplucharV02.nex). I am not sure if the problems described below are
>> caused by NCL or by the interface, so it would be great if someone
>> with more knowledge could look into it.
>>
>>  Let me know if you want more details/clarifications about these issues.
>>
>>  Cheers,
>>  -- François
>
> Thanks for working on this, François.
>
>>
>> 1. char.all=TRUE/FALSE (if TRUE includes even excluded characters in
>> the NEXUS file)
>> This doesn't seem to work. In the example file, the character Test3 is
>> supposed to be excluded (in the ASSUMPTIONS block), but the option has
>> no effect on the string returned by ReadCharsWithNCL. We could
>> temporarily remove this option.
>
>
> This is due to changes in the NCL. The way we got all vs some chars
> (NCLInterface.cpp) is
>
>                        if (allchar) {
>                                nchartoreturn=characters->GetNCharTotal();
>                        }
>                        else {
>                                nchartoreturn=characters->GetNChar();
>                        }
>
> but
>
> nxscharactersblock.h:|  The old GetNChar() function is now called
> GetNumIncludedChars();
>
> Changing GetNChar to GetNumIncludedChars should help (I haven't coded in
> phylobase lately, so I don't want to start committing code, but this is
> where I'd start looking).
>
>
>>
>> 2. polymorphic.convert=TRUE/FALSE (if TRUE converts polymorphic
>> characters to missing characters)
>> 2.1. polymorphic characters
>> In this case, the string returned by ReadCharsWithNCL differ depending
>> on the option. If polymorphic.convert=TRUE, NA are returned for
>> polymorphic states. If polymorphic.convert=FALSE, then
>> ReadCharsWithNCL returns all the states using curly brackets (e.g.
>> {0,1}), which produces an error message when evaluated within R. I
>> wrote a workaround (in R) for this problem that I should be able to
>> commit tomorrow. So, at least for now, it's not a crucial issue.
>
> Good. When writing this part of phylobase, I wanted to keep the option of
> using polymorphic characters, though I don't think any R phylogenetic
> packages could use this (but maybe I'm wrong). Coding this to use whatever
> is standard in R for showing polymorphism would be good.
>
>> 2.2. factor levels
>> Another somewhat related issue is the way the data frame based on the
>> data contained in the NEXUS file is created. Each character is treated
>> as a factor which is constructed using a call like:
>>
>> Test1=factor(c(1,NA,1,1,0,1,0,NA,NA,1,0,1,0,1,1,NA,0,0),levels=c(0,1,2,3),labels=c("test1A","test1B","","")
>> However, this kind of call produces warning messages because
>> duplicated labels aren't allowed anymore. The string created by
>> ReadCharsWithNCL creates unnecessary levels. The number of levels is
>> the same for all the characters in the data set. From the few tests I
>> have run, it looks that this number matches the maximum number of
>> states for a given character +1 (in the example file, only the
>> character "Test3" has 3 levels). I have also written workaround this
>> problem but there is the risk that this problem will turn into an
>> error message in the next few releases of R.
>
> It's good to fix the problem of duplicated labels. As for having the number
> of levels the same for all characters, regardless of how many states they
> have, this was deliberate. For example, you might have a data matrix for
> colors of flower parts, and use the same state coding (0=red, 1=white,
> 2=yellow) for three different flower parts (inner whorl of petals, outer
> whorl, stamen). If the first two parts are any of the three colors, and
> stamens are only red (0) or yellow (2), you don't want to recode it so that
> the 0 and 2 for the stamen in nexus become a 0 and 1 in R. This would make
> plants that have yellow petals and stamens (222) be recoded as having white
> stamens (221), which could affect later analyses. If you do want this
> recoding so that characters with two states only have two levels, you could
> use levels.uniform=FALSE. levels.uniform=TRUE is the default because this is
> how most people code traits.
>
> Hope this helps,
> Brian
>
>
>> _______________________________________________
>> Phylobase-devl mailing list
>> Phylobase-devl at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
>
> ------------------------------------------------------
> Brian O'Meara
> http://www.brianomeara.info
> Assistant Prof.
> Dept. Ecology & Evolutionary Biology
> U. of Tennessee, Knoxville
>
>