[Rcpp-devel] Unicode on windows
Ned Harding
ned at alteryx.com
Wed Jun 26 19:54:21 CEST 2013
I am having issues with the wide string conversion to and from Rcpp. When taking in a string from R that is encoding UTF-8, I would expect as<wstring> to have converted the utf-8 to a wide string. Instead, it is just widening all the characters and leaving the UTF-8 encoding. I have no issue with UTF-8, but my issue is that Rcpp doesn't seem to be able to tell me what encoding the source is so I don't know if I should convert or not.
Similarly, I would expect that wrap<wstring> would produce a UTF-8 encoding SEXP, but instead the encoding in R comes back "Unknown" and the data can't print. See The C++ & R sources below along with the output.
C++ function
----------------------------------------
RcppExport SEXP TestWide(SEXP _strIn)
{
std::wstring strIn = Rcpp::as<std::wstring>(_strIn);
for (const wchar_t *p = strIn.c_str(); *p; ++p)
Rprintf("%x\n", *p);
std::wstring str = L"a\x02a5c";
return Rcpp::wrap(str);
}
R Script
----------------------------------------
test <- "a\u02a5b"
a<-.Call( "TestWide", test, PACKAGE = "AlteryxRDataX" )
print(Encoding(a))
print(a)
R Output
----------------------------------------
R version 3.0.0 (2013-04-03) - x86_64
rgeos version: 0.2-16, (SVN revision 389)
GEOS runtime version: 3.3.6-CAPI-1.7.6
Polygon checking: TRUE
61
ffca
ffa5
62
"unknown"
"a?"
Thanks,
Ned Harding
Alteryx
CTO
3825 Iris Avenue, Suite 150
Boulder, CO 80301
Phone: 720-259-0541
eMail: ned at alteryx.com<mailto:ned at alteryx.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130626/bf7db016/attachment.html>
More information about the Rcpp-devel
mailing list