[Phylobase-devl] New phylobase build approach using static libncl (Was: Rcpp and OS X compiliation)
François Michonneau
francois.michonneau at gmail.com
Wed Apr 28 23:30:09 CEST 2010
Hi all,
Sorry if this is a dumb question, but why do we need to remove spaces
and underscore from the species names when building the data frame? The
only character that I can think of that could be an issue is ", and I
don't think that it's allowed by software using NEXUS/used.
In other words, do we really need to use RemoveUnderscoresAndSpaces in
NCLInterface.cpp?
Thanks,
-- François
On Mon, 2010-04-26 at 12:11 +0100, Orme, David wrote:
> I'd guess we want the names to be syntactically valid R names - and ideally that would be through running make.names() across them. The problem is then that the NCLInterface can easily pass the raw PAUP identifiers for the data (which we can then make.names()) but that the tree input is currently via a text string. Again, probably easy enough to have the raw PAUP names in the string but these would be horrible to extract with regex. Is there any way that NCLInterface can pass the tree using numeric symbols and then pass a translate block as a vector? Then make.names() could be run easily on both the data names and the tree names...
>
> Cheers,
> David
>
>
>
> On 23 Apr 2010, at 14:30, François Michonneau wrote:
>
> >
> > Hi,
> >
> > Ouch! We need to fix this.
> >
> > There might be some hope if we use Rcpp to build the data frame
> > instead of building and parsing a string.
> >
> > Let me talk to Dirk about it and see what we can do.
> >
> > Cheers,
> > -- François
> >
> > On Fri, 2010-04-23 at 13:59 +0100, Orme, David wrote:
> >> Hi all,
> >>
> >> From an e-mail on 03/03/10:
> >>
> >> Mark then Peter
> >>
> >>>> The main potential problems that I see with the ways that phylobase is using NCL now are:
> >>>> 1. in NCLInterface.cpp there are lots of call to RemoveUnderscoresAndSpaces to get rid of spaces and _ in names. That makes names easier to deal with, but at some point will bite you (somebody will have dataset with a taxon labelled "AB" and another with "A B", after transformation there will be a name clash).
> >>>
> >>> I agree that this is something to address. Not only might there be clashes but changing names, will be annoying to users. Brian or Derrick could answer better, but I assume this is because some of the code used to parse the tree string can't handle the underscores and spaces.
> >>
> >> Has just bitten me! There is a deeper problem here in that readNexus uses the NCLInterface code to get the data frame as parsable R code - with stripped spaces and underscores - but the tree block is passed over as a block of raw text from the file. These names _aren't_ then stripped of underscores and spaces by read.nexustreestring() and so the name checking throws an error.
> >>
> >> Obviously there is an ongoing deeper discussion about how to handle passing the tree from NCL and how to handle the dismayingly wide range of official valid PAUP identifiers using regex but currently we've got a simpler problem of different handling. Underscores in names are very commonly used to avoid the quoting problem with spaces so I think this current problem will come up a lot.
> >>
> >> Cheers,
> >> David
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Phylobase-devl mailing list
> >> Phylobase-devl at lists.r-forge.r-project.org
> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
> >
>
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20100428/2ecc2ce8/attachment.pgp>
More information about the Phylobase-devl
mailing list