[Phylobase-devl] New phylobase build approach using static libncl (Was: Rcpp and OS X compiliation)

Orme, David d.orme at imperial.ac.uk
Thu Apr 29 17:43:08 CEST 2010


Hi,

I know this isn't the bible - but the PAUP manual specifies the following:

"Identifiers" are simply names given to taxa, characters, and other PAUP input elements such as character-sets, taxon-sets, and exclusion-sets. They may include any combination of upper- and lower-case alphabetic characters, digits, and punctuation. If the identifier contains any of the following characters:
( ) [ ] { } / \ , ; : = * ' "` + - < >
or a blank, the entire identifier must be enclosed in single quotes.

They're going to be rare but they will happen. Any of those are problematic in a valid R name - although not in a character string. The taxon identifiers could come into R as character vectors just as they appear in the Nexus file (possibly stripping the enclosing single quotes). The next question is then whether we need them to be valid R names rather than character strings - in which case make.names() could be invoked.

Cheers,
David


On 28 Apr 2010, at 22:30, François Michonneau wrote:


Hi all,

 Sorry if this is a dumb question, but why do we need to remove spaces
and underscore from the species names when building the data frame? The
only character that I can think of that could be an issue is ", and I
don't think that it's allowed by software using NEXUS/used.

 In other words, do we really need to use RemoveUnderscoresAndSpaces in
NCLInterface.cpp?

 Thanks,
 -- François

On Mon, 2010-04-26 at 12:11 +0100, Orme, David wrote:
I'd guess we want the names to be syntactically valid R names - and ideally that would be through running make.names() across them. The problem is then that the NCLInterface can easily pass the raw PAUP identifiers for the data (which we can then make.names()) but that the tree input is currently via a text string. Again, probably easy enough to have the raw PAUP names in the string but these would be horrible to extract with regex. Is there any way that NCLInterface can pass the tree using numeric symbols and then pass a translate block as a vector? Then make.names() could be run easily on both the data names and the tree names...

Cheers,
David



On 23 Apr 2010, at 14:30, François Michonneau wrote:


Hi,

Ouch! We need to fix this.

There might be some hope if we use Rcpp to build the data frame
instead of building and parsing a string.

Let me talk to Dirk about it and see what we can do.

Cheers,
-- François

On Fri, 2010-04-23 at 13:59 +0100, Orme, David wrote:
Hi all,



More information about the Phylobase-devl mailing list