No subject
Sun Feb 21 22:45:25 CET 2010
Mark then Peter
>> The main potential problems that I see with the ways that phylobase is u=
sing NCL now are:
>> 1. in NCLInterface.cpp there are lots of call to RemoveUnderscoresAndSp=
aces to get rid of spaces and _ in names. That makes names easier to deal =
with, but at some point will bite you (somebody will have dataset with a ta=
xon labelled "AB" and another with "A B", after transformation there will b=
e a name clash).
>=20
> I agree that this is something to address. Not only might there be clash=
es but changing names, will be annoying to users. Brian or Derrick could a=
nswer better, but I assume this is because some of the code used to parse t=
he tree string can't handle the underscores and spaces.
Has just bitten me! There is a deeper problem here in that readNexus uses t=
he NCLInterface code to get the data frame as parsable R code - with stripp=
ed spaces and underscores - but the tree block is passed over as a block of=
raw text from the file. These names _aren't_ then stripped of underscores =
and spaces by read.nexustreestring() and so the name checking throws an err=
or.=20
Obviously there is an ongoing deeper discussion about how to handle passing=
the tree from NCL and how to handle the dismayingly wide range of official=
valid PAUP identifiers using regex but currently we've got a simpler probl=
em of different handling. Underscores in names are very commonly used to av=
oid the quoting problem with spaces so I think this current problem will co=
me up a lot.=20
Cheers,
David
More information about the Phylobase-devl
mailing list