[Rcpp-devel] Rcpp version of %in%

Romain Francois romain at r-enthusiasts.com
Thu Nov 15 17:52:10 CET 2012


Hello,

I've commited an Rcpp version of %in%.

For example:

require(Rcpp)
require(microbenchmark)

sourceCpp( code = '
#include <Rcpp.h>
using namespace Rcpp ;

// [[Rcpp::export]]
LogicalVector in_( CharacterVector x, CharacterVector table){
     return in( x, table ) ;
}
' )

`%in++%` <- in_


 > c("a", "ad") %in++% letters
[1]  TRUE FALSE

In terms of performance:

 > xx <- sample( sample(letters, 15 ), 1000000, replace = TRUE )
 > microbenchmark(
+     xx %in% letters,
+     xx %in++%  letters,
+     in_( xx, letters )
+ )
Unit: milliseconds
                expr      min       lq   median       uq      max
1  in_(xx, letters) 12.79488 12.85228 12.88214 15.33067 44.65161
2   xx %in% letters 31.96431 34.43951 34.90381 35.37460 65.68226
3 xx %in++% letters 12.81114 12.86457 12.91557 15.06667 16.20493



The tool here is unordered_set as we don't care where the data is on the 
table, we just want to know if it is there.

Might be interesting at some point to check alternatives to the standard 
hasing functions... e.g. play with sparsehash: 
http://code.google.com/p/sparsehash/

Romain

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible



More information about the Rcpp-devel mailing list