[Rcpp-devel] Logical subsetting example

Dirk Eddelbuettel edd at debian.org
Sun Dec 30 17:38:15 CET 2012


The topic of how to do the equivalent of the last step here

    set.seed(42)
    a <- sort(round(runif(10)*100))

    a[ a %% 2 == 0] 

ie how to subset by some boolean vector had come up before, both here and on
StackOverflow.

A while back, I got hold of a used copy of Austern's classic "Generic
Programming and the STL" in which I found (by random luck, I guess) the
remove_copy() function which takes a vector and copies all values different
from a given 'flagged' value.  Very nice -- all is missing then is a
preceding call to transform() in its two vector form to flag values given a
second boolean vector.  Combine the two and you get this:

   #include <Rcpp.h>
   
   using namespace Rcpp;
   using namespace std;
   
   const double flagval = __DBL_MIN__; // works
   //const double flagval = NA_REAL;   // does not
   
   // simple double value 'flagging' function
   inline double flag(double a, bool b) { return b ? a : flagval; }
   
   // [[Rcpp::export]]
   NumericVector subsetter(NumericVector a, LogicalVector b) {
       // We use the flag() function to mark values of 'a' 
       // for which 'b' is false with the 'flagval'
       transform(a.begin(), a.end(), b.begin(), a.begin(), flag);
   
       // We use sugar's sum to compute how many true values to expect
       NumericVector res = NumericVector(sum(b));
   
       // And then copy the ones different from flagval from a into
       // res using the remove_copy function from the STL
       remove_copy(a.begin(), a.end(), res.begin(), flagval);
       return res;    
   }

and with that can do

    subsetter(a, a %% 2 == 0)

which (of course) passes all.equal() with the R solution.  I also did some
tests for 1e6 elements in the vector and the logical vector --- and this is a
tad quicker than R itself.

This would need some more work for a properly templated solution using traits
to the sentinel values right. The NA_REAL or R_NaN values do unfortunately
not work as they cannot be used with equality tests, so we waste another
value (which picked as __DBL_MIN__).

Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com  


More information about the Rcpp-devel mailing list