Rcpp "version" of R's match function
Romain Francois
romain at r-enthusiasts.com
Thu Nov 15 00:49:46 CET 2012
Ah. Things are particularly interesting if you want to deal with strings.
The code below implements a c++ version of match for character vectors.
beware it uses trickery related to tricking the write barrier (but not
an issue as we are not "writing").
sourceCpp( code = '
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
IntegerVector match_( const CharacterVector& x, const CharacterVector&
typedef std::tr1::unordered_map<SEXP,int> MAP ;
typedef MAP::value_type VALUE ;
// populate the hash
MAP hash ;
int n = table.size() ;
SEXP* ptr = get_string_ptr(table) ;
for( int i=0 ; i<n; i++){
hash.insert( std::make_pair<SEXP,int>(ptr[i], i + 1) ) ;
n = x.size() ;
IntegerVector result(n) ;
ptr = get_string_ptr(x) ;
MAP::const_iterator end=hash.end() ;
for( int i=0; i<n; i++){
MAP::const_iterator it = hash.find(ptr[i]) ;
if( it == end ){
// no match
result[i] = NA_INTEGER ;
} else {
result[i] = it->second ;
return result ;
' )
The key of the trick is to use the pointer of the CHARSXP SEXP as the
key of the hash map. (strings are cached in R, see the CHARSXP cache in
R internals:
http://cran.r-project.org/doc/manuals/R-ints.html#The-CHARSXP-cache )
> match_( c( "b", "k") , letters )
[1] 2 11
But also (fasten your seatbelt):
> xx <- sample( letters, 1000000, replace = TRUE )
> system.time( match( xx, letters ) )
utilisateur système écoulé
0.063 0.002 0.065
> system.time( match_( xx, letters ) )
utilisateur système écoulé
0.015 0.000 0.015
