[Phylobase-devl] Issues with NCL and/or NCL interface
François Michonneau
francois.michonneau at gmail.com
Fri Mar 12 00:00:29 CET 2010
I realized that an example might illustrate the results of these
changes better. Below 2 cases for the same data set, the first with
"polymorphic.convert=FALSE, return.labels=FALSE" and the second with
"polymorphic.convert=TRUE, return.labels=TRUE"
readNexus(file="treepluscharV02.nex", polymorphic.convert=F,
levels.uniform=F, return.labels=F)
label node ancestor edge.length node.type Test1 Test2
1 Myrmecocystussemirufus 1 27 1.724765 tip 0 0
2 Myrmecocystusplacodops 2 27 1.724765 tip 0 0
3 Myrmecocystusmendax 3 26 4.650818 tip 1 0
4 Myrmecocystuskathjuli 4 28 1.083870 tip 1 0
5 Myrmecocystuswheeleri 5 28 1.083870 tip 0 0
6 Myrmecocystusmimicus 6 30 2.708942 tip <NA> <NA>
7 Myrmecocystusdepilis 7 30 2.708942 tip 1 0
8 Myrmecocystusromainei 8 32 2.193845 tip 1 1
9 Myrmecocystusnequazcatl 9 32 2.193845 tip 1 0
10 Myrmecocystusyuma 10 31 4.451425 tip 0 1
11 Myrmecocystuskennedyi 11 23 6.044804 tip 0 1
12 Myrmecocystuscreightoni 12 22 10.569191 tip <NA> {0,1}
13 Myrmecocystussnellingi 13 33 2.770378 tip 1 <NA>
14 Myrmecocystustenuinodis 14 33 2.770378 tip 1 0
15 Myrmecocystustestaceus 15 20 12.300701 tip <NA> <NA>
16 Myrmecocystusmexicanus 16 34 5.724923 tip 0 0
17 Myrmecocystuscfnavajo 17 35 2.869547 tip 1 {0,1}
18 Myrmecocystusnavajo 18 35 2.869547 tip <NA> 1
readNexus(treepluscharV02.nex", polymorphic.convert=T,
levels.uniform=F, return.labels=T)
label node ancestor edge.length node.type Test1 Test2
1 Myrmecocystussemirufus 1 27 1.724765 tip test1A test2A
2 Myrmecocystusplacodops 2 27 1.724765 tip test1A test2A
3 Myrmecocystusmendax 3 26 4.650818 tip test1B test2A
4 Myrmecocystuskathjuli 4 28 1.083870 tip test1B test2A
5 Myrmecocystuswheeleri 5 28 1.083870 tip test1A test2A
6 Myrmecocystusmimicus 6 30 2.708942 tip <NA> <NA>
7 Myrmecocystusdepilis 7 30 2.708942 tip test1B test2A
8 Myrmecocystusromainei 8 32 2.193845 tip test1B test2B
9 Myrmecocystusnequazcatl 9 32 2.193845 tip test1B test2A
10 Myrmecocystusyuma 10 31 4.451425 tip test1A test2B
11 Myrmecocystuskennedyi 11 23 6.044804 tip test1A test2B
12 Myrmecocystuscreightoni 12 22 10.569191 tip <NA> <NA>
13 Myrmecocystussnellingi 13 33 2.770378 tip test1B <NA>
14 Myrmecocystustenuinodis 14 33 2.770378 tip test1B test2A
15 Myrmecocystustestaceus 15 20 12.300701 tip <NA> <NA>
16 Myrmecocystusmexicanus 16 34 5.724923 tip test1A test2A
17 Myrmecocystuscfnavajo 17 35 2.869547 tip test1B <NA>
18 Myrmecocystusnavajo 18 35 2.869547 tip <NA> test2B
On Thu, Mar 11, 2010 at 17:51, François Michonneau
<francois.michonneau at gmail.com> wrote:
>>> 1. char.all=TRUE/FALSE (if TRUE includes even excluded characters in
>>> the NEXUS file)
>>> This doesn't seem to work. In the example file, the character Test3 is
>>> supposed to be excluded (in the ASSUMPTIONS block), but the option has
>>> no effect on the string returned by ReadCharsWithNCL. We could
>>> temporarily remove this option.
>>
>>
>> This is due to changes in the NCL. The way we got all vs some chars
>> (NCLInterface.cpp) is
>>
>> if (allchar) {
>> nchartoreturn=characters->GetNCharTotal();
>> }
>> else {
>> nchartoreturn=characters->GetNChar();
>> }
>>
>> but
>>
>> nxscharactersblock.h:| The old GetNChar() function is now called
>> GetNumIncludedChars();
>>
>> Changing GetNChar to GetNumIncludedChars should help (I haven't coded in
>> phylobase lately, so I don't want to start committing code, but this is
>> where I'd start looking).
>
>
>>> 2. polymorphic.convert=TRUE/FALSE (if TRUE converts polymorphic
>>> characters to missing characters)
>>> 2.1. polymorphic characters
>>> In this case, the string returned by ReadCharsWithNCL differ depending
>>> on the option. If polymorphic.convert=TRUE, NA are returned for
>>> polymorphic states. If polymorphic.convert=FALSE, then
>>> ReadCharsWithNCL returns all the states using curly brackets (e.g.
>>> {0,1}), which produces an error message when evaluated within R. I
>>> wrote a workaround (in R) for this problem that I should be able to
>>> commit tomorrow. So, at least for now, it's not a crucial issue.
>>
>> Good. When writing this part of phylobase, I wanted to keep the option of
>> using polymorphic characters, though I don't think any R phylogenetic
>> packages could use this (but maybe I'm wrong). Coding this to use whatever
>> is standard in R for showing polymorphism would be good.
>>
>>> 2.2. factor levels
>>> Another somewhat related issue is the way the data frame based on the
>>> data contained in the NEXUS file is created. Each character is treated
>>> as a factor which is constructed using a call like:
>>>
>>> Test1=factor(c(1,NA,1,1,0,1,0,NA,NA,1,0,1,0,1,1,NA,0,0),levels=c(0,1,2,3),labels=c("test1A","test1B","","")
>>> However, this kind of call produces warning messages because
>>> duplicated labels aren't allowed anymore. The string created by
>>> ReadCharsWithNCL creates unnecessary levels. The number of levels is
>>> the same for all the characters in the data set. From the few tests I
>>> have run, it looks that this number matches the maximum number of
>>> states for a given character +1 (in the example file, only the
>>> character "Test3" has 3 levels). I have also written workaround this
>>> problem but there is the risk that this problem will turn into an
>>> error message in the next few releases of R.
>>
>> It's good to fix the problem of duplicated labels. As for having the number
>> of levels the same for all characters, regardless of how many states they
>> have, this was deliberate. For example, you might have a data matrix for
>> colors of flower parts, and use the same state coding (0=red, 1=white,
>> 2=yellow) for three different flower parts (inner whorl of petals, outer
>> whorl, stamen). If the first two parts are any of the three colors, and
>> stamens are only red (0) or yellow (2), you don't want to recode it so that
>> the 0 and 2 for the stamen in nexus become a 0 and 1 in R. This would make
>> plants that have yellow petals and stamens (222) be recoded as having white
>> stamens (221), which could affect later analyses. If you do want this
>> recoding so that characters with two states only have two levels, you could
>> use levels.uniform=FALSE. levels.uniform=TRUE is the default because this is
>> how most people code traits.
>
> I changed the way the characters are returned to R by the NCL
> interface and it should behave as it was originally intended (I hope).
>
> I put quotes around the levels (i.e. states) of the characters. Then,
> it becomes unnecessary to use the argument 'levels'. Indeed, if
> levels.uniform is FALSE, then R does what it's supposed to do and
> create unique levels for each character. If levels.uniform is TRUE,
> then I force a posteriori all characters to have the same levels (the
> code for this part isn't the most elegant but it seems to do the job).
>
> Using the quotes, also allows to return polymorphic characters "as is"
> (i.e. with the curly brackets); and these polymorphic characters are
> thus treated as additional levels of the factors. It seems to me that
> the user should be able to deal with it if s/he wants to use the
> polymorphism in the analysis.
>
> It seemed that levels.uniform wasn't really doing what it was supposed
> to do before the changes I committed today. Instead it was returning
> the labels associated with the character states. I thus added the new
> option 'return.labels' to readNexus to do this. Instead of returning
> the code for the state (e.g. 1) it returns its value (e.g.
> "nocturnal").
>
> Obviously, this feature doesn't play nice with polymorphic characters.
> So, if you try to use 'return.labels' with a dataset that includes
> polymorphic characters you obtain an error message saying that it's
> not implemented.
>
> I have to bring a few changes to my unit tests that I'll commit tomorrow.
>
> Cheers,
> -- François
>
More information about the Phylobase-devl
mailing list