[Rcpp-devel] String encoding (UTF-8 conversion)

Romain Francois romain at r-enthusiasts.com
Tue Dec 16 07:22:16 CET 2014


That is similar to a path i've followed in Rcpp11/Rcpp14.

What's really missing in R is api access to strings, e.g testing for equality of two CHARSXP, comparing them, ...

This causes all sorts of problems with dplyr. 

Romain

> Le 16 déc. 2014 à 06:00, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> a écrit :
> 
>> On Thu, Dec 11, 2014 at 12:24 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:
>> I'm interfacing a c++ library which assumes strings are UTF-8. However
>> strings from R can have various encodings. It's not clear to me how I
>> need to account for that in Rcpp.
> 
> Follow-up on this: from what I have found, there is currently no
> string type that is unambiguous across platforms and locales (other
> than the actual STRSXP). If the native locale uses UTF8 than all is
> fine, but we can not assume that in R. Here is a little script that
> illustrates the various combinations I tried and the results on
> Windows: https://gist.github.com/jeroenooms/9edf97f873f17a4ce5d3.
> 
> Assuming that each of these cases are intended behavior, perhaps we
> could introduce an additional string type e.g. Rcpp::UTF8String. The
> mapping from STRSXP to Rcpp::UTF8String would use
> translateCharUTF8(STRING_ELT(x, 0)) and the mapping Rcpp::UTF8String
> back to STRSXP would use SET_STRING_ELT(out, 0, mkCharCE(olds,
> CE_UTF8)). That would allow for defining c++ functions operating on
> UTF8 strings which will work as expected across platforms and locales.
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


More information about the Rcpp-devel mailing list