[Rcpp-devel] Rcpp "version" of R's match function

Romain Francois romain at r-enthusiasts.com
Fri Nov 16 22:51:27 CET 2012


Le 16/11/12 18:03, Hadley Wickham a écrit :
>> Ouch. Simon wins:
>>
>>                    expr      min        lq    median        uq      max
>> 1  fmatch(xx, letters)  59.4727  60.29989  74.18049  77.94288 112.2938
>> 2 match__(xx, letters) 137.3878 137.77486 138.33766 152.14018 193.3748
>> 3  match_(xx, letters) 147.7115 148.36442 149.20221 162.82343 171.3181
>> 4   match(xx, letters) 288.4345 293.10380 294.58833 296.26125 333.1210
>
> But only because he builds up the hash table once - the others have to
> do it every time.
>
> Hadley

I don't think that's it. here, we have to create a table for 26 entries. 
this takes no time at all:

sourceCpp( code = '
#include <Rcpp.h>
using namespace Rcpp ;

// [[Rcpp::export]]
void match_create_hash( const CharacterVector& x, const CharacterVector& 
table){
     typedef std::tr1::unordered_map<SEXP,int> MAP ;
     typedef MAP::value_type VALUE ;

     // populate the hash
     MAP hash ;
     int n = table.size() ;
     SEXP* ptr = get_string_ptr(table) ;
     for( int i=0 ; i<n; i++){
         hash.insert( std::make_pair<SEXP,int>(ptr[i], i + 1) ) ;
     }
}

' )

 > xx <- sample( letters, 10000000, replace = TRUE )
 > system.time( match_create_hash( xx, letters ) )
utilisateur     système      écoulé
       0.000       0.000       0.001

Also, in this example, there is no difference between the first and 
second time fmatch is used:

 > system.time( fmatch( xx, letters ) )
utilisateur     système      écoulé
       0.060       0.000       0.061
 > system.time( fmatch( xx, letters ) )
utilisateur     système      écoulé
       0.059       0.000       0.060
 > system.time( fmatch( xx, letters ) )
utilisateur     système      écoulé
       0.060       0.000       0.059
...


This is about looking up in the hash table. We need to do it 10 million 
times in this example, so I guess his hashing function performs better.

...

Not giving up

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible



More information about the Rcpp-devel mailing list