[Rcpp-devel] Improving the speed of a custom function that fixes broken encoding

Casper Crause ccrause07 at gmail.com
Sun Jul 19 16:45:22 CEST 2020


Hi all!
I've written a function that makes use of a python module called ftfy and
made it available in R through Reticulate.
The aim is to fix broken encoding. The function is, unfortunately, a scalar
function.
I attempted to vectorise the function. by means of a for loop, but the
speed of this function is a real concern when the datasets get over 500 000
rows.

I've adapted the function to conditionally modify only broken text with
ifelse statements.

I *really* want to speed up this function using Rcpp, but there are two
problems

1. I tried researching how to call python functions from R through C++
scripts but none have been successful for me. (
https://gallery.rcpp.org/articles/rcpp-python/    )

2. I'm having trouble to modifying all elements of a StringVector using Rcpp

Any advice would be highly appreciated!

Attached are the vectorised function script, conditionally fixing broken
encoding, and lastly, my *flawed* Rcpp script


-- 

*Casper Crause*

*Cell:     072 475 8969*
*Email: ccrause07 at gmail.com <ccrause07 at gmail.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20200719/120f661a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fixing encoding.R
Type: application/octet-stream
Size: 2058 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20200719/120f661a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rcpp_fix_encoding.cpp
Type: application/octet-stream
Size: 566 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20200719/120f661a/attachment-0001.obj>


More information about the Rcpp-devel mailing list