[Rcpp-devel] Joining each row of CharacterMatrix to return a CharacterVector?

Dirk Eddelbuettel edd at debian.org
Tue Dec 11 00:09:05 CET 2012


Hi Pete,

On 11 December 2012 at 09:43, hickey at wehi.EDU.AU wrote:
| I preface this by stating that I'm very much a Rcpp beginner who is comfortable
| in R but I've never before used C++. I'm working through the Rcpp documentation
| but haven't been able to answer my question.
| 
| I've written an Rcpp (v0.10.1) function f that takes as input a CharacterMatrix
| X. X has 20 million rows and 100 columns. For each row of X the function alters
| certain entries of that row according to rules governed by some other input
| variables. f returns the updated version of X. This function works as I'd like
| it to: 
| # a toy example with nrow = 2, ncol = 2
| > X <- matrix('A', ncol = 2, nrow = 2)
| > X
|      [,1] [,2]
| [1,] "A"  "A" 
| [2,] "A"  "A" 
| > X <- f(X, other_input_variables)
| > X
|      [,1] [,2]
| [1,] "Z"  "A" 
| [2,] "z"  "A" 
| 
| However, instead of f returning a CharacterMatrix as it currently does, I'd
| like to return a CharacterVector Y, where each element of Y is a "collapsed"
| row of the updated X.
| 
| I can achieve the desired result in R by using: 
| Y <- apply(X=X, MARGIN = 1, FUN = function(x){paste0(x, collapse = '')}) 
| > Y
| [1] "ZA" "zA"
| 
| but I wondered whether this "joining" is likely to be more efficiently
| performed within my function f? If so, how do I join the 100 individual
| character entries of a row of the CharacterMatrix X into a single string that
| will then comprise an element of the returned CharacterVector Y?

Ah, the joy of working with character strings/vectors/pointers :)  

You certainly can. And there will be a lot of old, bad, ... tutorials out
there.  I can't right now think of a good tutorial to point you to -- other
than the perennial "C++ Annotations" by Brokken which is at the same time
good, current, up-to-date and free (!!) -- so maybe you shoud continue with
the little 2 x 2 and 3 x 3 examples:

 i)   loop over a row, first init the target string to be ""
 ii)  assign each element of the matrix to a string
 iii) append, which can be as easy as using the   +   for two strings
 iv)  accumulate the result strings in a vector of strings

That should work, does not require pointers, free, malloc, ...  You can
optimize later.

Hope this helps,  Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com  


More information about the Rcpp-devel mailing list