[Rcpp-devel] String encoding (UTF-8 conversion)

Dirk Eddelbuettel edd at debian.org
Thu Dec 11 23:16:20 CET 2014


On 11 December 2014 at 12:24, Jeroen Ooms wrote:
| I'm interfacing a c++ library which assumes strings are UTF-8. However
| strings from R can have various encodings. It's not clear to me how I
| need to account for that in Rcpp. For example:
| 
| // [[Rcpp::export]]
| std::string echo(std::string src){
|   return src;
| }
| 
| This program does not work on windows for non-ascii strings:
| 
| > test = "東京"
| > echo(test)
| [1] "æ ±äº¬
| 
| In C programs I always use translateCharUTF8 on all input to make sure
| it is UTF8 before I start working with it:
| 
|   translateCharUTF8(STRING_ELT(x, i));
| 
| Similarly on the output, I explicitly set the encoding to let R know
| it this is UTF8:
| 
|   SET_STRING_ELT(out, 0, mkCharCE(olds, CE_UTF8));
| 
| This ensures that code works across platforms and locales. How do we
| go about this in Rcpp?

Maybe the same way?  ;-) 

A valid C expression is almost always a valid C++ expression. I haven't
needed this.  But as I recall, Romain did work with wchar for some project so
he may have a hint or two for you.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org


More information about the Rcpp-devel mailing list