<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Mar 11, 2010, at 5:51 PM, François Michonneau wrote:</div><blockquote type="cite"><div><blockquote type="cite"><font class="Apple-style-span" color="#000000"><snip></font></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">2.2. factor levels<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Another somewhat related issue is the way the data frame based on the<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">data contained in the NEXUS file is created. Each character is treated<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">as a factor which is constructed using a call like:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Test1=factor(c(1,NA,1,1,0,1,0,NA,NA,1,0,1,0,1,1,NA,0,0),levels=c(0,1,2,3),labels=c("test1A","test1B","","")<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">However, this kind of call produces warning messages because<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">duplicated labels aren't allowed anymore. The string created by<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">ReadCharsWithNCL creates unnecessary levels. The number of levels is<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">the same for all the characters in the data set. From the few tests I<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">have run, it looks that this number matches the maximum number of<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">states for a given character +1 (in the example file, only the<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">character "Test3" has 3 levels). I have also written workaround this<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">problem but there is the risk that this problem will turn into an<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">error message in the next few releases of R.<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">It's good to fix the problem of duplicated labels. As for having the number<br></blockquote><blockquote type="cite">of levels the same for all characters, regardless of how many states they<br></blockquote><blockquote type="cite">have, this was deliberate. For example, you might have a data matrix for<br></blockquote><blockquote type="cite">colors of flower parts, and use the same state coding (0=red, 1=white,<br></blockquote><blockquote type="cite">2=yellow) for three different flower parts (inner whorl of petals, outer<br></blockquote><blockquote type="cite">whorl, stamen). If the first two parts are any of the three colors, and<br></blockquote><blockquote type="cite">stamens are only red (0) or yellow (2), you don't want to recode it so that<br></blockquote><blockquote type="cite">the 0 and 2 for the stamen in nexus become a 0 and 1 in R. This would make<br></blockquote><blockquote type="cite">plants that have yellow petals and stamens (222) be recoded as having white<br></blockquote><blockquote type="cite">stamens (221), which could affect later analyses. If you do want this<br></blockquote><blockquote type="cite">recoding so that characters with two states only have two levels, you could<br></blockquote><blockquote type="cite">use levels.uniform=FALSE. levels.uniform=TRUE is the default because this is<br></blockquote><blockquote type="cite">how most people code traits.<br></blockquote><br>I changed the way the characters are returned to R by the NCL<br>interface and it should behave as it was originally intended (I hope).<br></div></blockquote><div><br></div><div>Phylobase is still new enough that I think it's worth, in cases like this where people probably haven't used it much, to go with what will be the most useful for users. Hopefully, this and the original intended behavior are the same, but much of the original behavior in this particular area was designed by me during the original hackathon, and I'm more than open to having it changed for better utility.</div><div><br></div><br><blockquote type="cite"><div><br>I put quotes around the levels (i.e. states) of the characters. Then,<br>it becomes unnecessary to use the argument 'levels'. Indeed, if<br>levels.uniform is FALSE, then R does what it's supposed to do and<br>create unique levels for each character. If levels.uniform is TRUE,<br>then I force a posteriori all characters to have the same levels (the<br>code for this part isn't the most elegant but it seems to do the job).<br><br>Using the quotes, also allows to return polymorphic characters "as is"<br>(i.e. with the curly brackets); and these polymorphic characters are<br>thus treated as additional levels of the factors. It seems to me that<br>the user should be able to deal with it if s/he wants to use the<br>polymorphism in the analysis.<br></div></blockquote><div><br></div><div>Nice.</div><br><blockquote type="cite"><div><br>It seemed that levels.uniform wasn't really doing what it was supposed<br>to do before the changes I committed today. Instead it was returning<br>the labels associated with the character states. I thus added the new<br>option 'return.labels' to readNexus to do this. Instead of returning<br>the code for the state (e.g. 1) it returns its value (e.g.<br>"nocturnal").<br></div></blockquote><div><br></div><div><br></div><div>Good idea.</div><br><blockquote type="cite"><div><br>Obviously, this feature doesn't play nice with polymorphic characters.<br></div></blockquote><div><br></div><div>Why not? Is there a difference between '{0, 1}' and '{nocturnal, diurnal}'? The latter would only be an issue if some state names had commas in them, but that's such an infrequent use case that we could just have a warning if a comma in a state name is detected and there is a polymorphic character.</div><div><br></div><br><blockquote type="cite"><div>So, if you try to use 'return.labels' with a dataset that includes<br>polymorphic characters you obtain an error message saying that it's<br>not implemented.<br><br>I have to bring a few changes to my unit tests that I'll commit tomorrow.<br></div></blockquote><div><br></div><div>Thanks again for your work on this.</div><div><br></div><div>Best,</div><div>Brian</div><div><br></div><br><blockquote type="cite"><div><br> Cheers,<br> -- François<br></div></blockquote></div><br><div apple-content-edited="true"> <span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div>------------------------------------------------------</div><div>Brian O'Meara</div><div><a href="http://www.brianomeara.info">http://www.brianomeara.info</a></div><div>Assistant Prof.</div><div>Dept. Ecology & Evolutionary Biology</div><div>U. of Tennessee, Knoxville</div></span></div></span> </div><br></body></html>