No subject


Sun Feb 21 22:45:25 CET 2010


Mark then Peter

>> The main potential problems that I see with the ways that phylobase is u=
sing NCL now are:
>> 	1. in NCLInterface.cpp there are lots of call to RemoveUnderscoresAndSp=
aces to get rid of spaces and _ in names.  That makes names easier to deal =
with, but at some point will bite you (somebody will have dataset with a ta=
xon labelled "AB" and another with "A B", after transformation there will b=
e a name clash).
>=20
> I agree that this is something to address.  Not only might there be clash=
es but changing names, will be annoying to users.  Brian or Derrick could a=
nswer better, but I assume this is because some of the code used to parse t=
he tree string can't handle the underscores and spaces.

Has just bitten me! There is a deeper problem here in that readNexus uses t=
he NCLInterface code to get the data frame as parsable R code - with stripp=
ed spaces and underscores - but the tree block is passed over as a block of=
 raw text from the file. These names _aren't_ then stripped of underscores =
and spaces by read.nexustreestring() and so the name checking throws an err=
or.=20

Obviously there is an ongoing deeper discussion about how to handle passing=
 the tree from NCL and how to handle the dismayingly wide range of official=
 valid PAUP identifiers using regex but currently we've got a simpler probl=
em of different handling. Underscores in names are very commonly used to av=
oid the quoting problem with spaces so I think this current problem will co=
me up a lot.=20

Cheers,
David








More information about the Phylobase-devl mailing list