[Rcpp-devel] String encoding (UTF-8 conversion)
Dirk Eddelbuettel
edd at debian.org
Thu Dec 11 23:16:20 CET 2014
On 11 December 2014 at 12:24, Jeroen Ooms wrote:
| I'm interfacing a c++ library which assumes strings are UTF-8. However
| strings from R can have various encodings. It's not clear to me how I
| need to account for that in Rcpp. For example:
|
| // [[Rcpp::export]]
| std::string echo(std::string src){
| return src;
| }
|
| This program does not work on windows for non-ascii strings:
|
| > test = "東京"
| > echo(test)
| [1] "æ ±äº¬
|
| In C programs I always use translateCharUTF8 on all input to make sure
| it is UTF8 before I start working with it:
|
| translateCharUTF8(STRING_ELT(x, i));
|
| Similarly on the output, I explicitly set the encoding to let R know
| it this is UTF8:
|
| SET_STRING_ELT(out, 0, mkCharCE(olds, CE_UTF8));
|
| This ensures that code works across platforms and locales. How do we
| go about this in Rcpp?
Maybe the same way? ;-)
A valid C expression is almost always a valid C++ expression. I haven't
needed this. But as I recall, Romain did work with wchar for some project so
he may have a hint or two for you.
Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
More information about the Rcpp-devel
mailing list