[Rcpp-devel] Testing for existence of named components

Ulrich Bodenhofer bodenhofer at bioinf.jku.at
Thu Mar 29 09:14:19 CEST 2012


Thanks so much, Jonas and Dirk! I just tried it and it works great. 
Jonas' benchmarking results are awesome too. I hope our discussion will 
help many other people who will come across this issue some time.

Cheers, Ulrich


On 03/28/2012 05:07 PM, Dirk Eddelbuettel wrote:
> On 28 March 2012 at 16:36, Jonas Rauch wrote:
> | If I remember correctly, environments use hash tables to lookup names.
> |
> | In pure R to check if an element L[["key"]] exists I would usually use
> |>  "key" %in% names(L)
> | which seems to be reasonably fast. `%in%` works in vectors, too
> |>  keys %in% L
> | returning exactly what Dirks suggestion was supposed to do (you missed an "if",
> | or alternatively an "or")
> |
> | `%in%` uses .Internal(match(...)), which computes a temporary hash table as far
> | as I can see in the source (src/main/unique.c). I am not sure about the
> | implementation of the Hash. In particular I don't know if some kind of caching
> | is done, because the lookup is much faster than Dirks function (see below) and
> | computing a hash table for one lookup seems inefficient. I think it would be
> | good to use the internal `match` function of R to do what you want. In
> | Rinternals.h match is exposed as
> |   SEXP Rf_match(SEXP itable, SEXP ix, int nmatch)
> | where nmatch is the value to be returned if there is no match.
>
> Excellent suggestion!
>
> I think we can even return just the index vector, with NA or another filler
> for non-matches:
>
> R>  g<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
> +   Rcpp::List lst(ls);
> +   Rcpp::CharacterVector nam = lst.names();
> +
> +   Rcpp::IntegerVector ind(Rf_match(nam,ts,R_NaInt));
> +   return ind;
> + ')
> R>  g(ll, c("aa", "b", "c"))
> [1] NA  2 NA
> R>
>
> Dirk
>
> | Below are the results of my little benchmark (see below). First parameter of
> | `lookup` is the number elements in L, second is the length of keys, third one
> | specifies the probability that a key is in present in the list.
> |
> | Regards,
> | Jonas
> |
> | ##Benchmark results
> |>  lookup(10000, 1000, 0.1)
> |                 test elapsed relative
> | 2         f(L, keys)   0.641   160.25
> | 3         g(L, keys)   0.005     1.25
> | 1 keys %in% names(L)   0.004     1.00
> |>  lookup(10000, 1000, 1.0)
> |                 test elapsed relative
> | 2         f(L, keys)   0.643   160.75
> | 3         g(L, keys)   0.004     1.00
> | 1 keys %in% names(L)   0.004     1.00
> |>  lookup(100000, 100, 0.1)
> |                 test elapsed relative
> | 2         f(L, keys)   0.656      164
> | 3         g(L, keys)   0.004        1
> | 1 keys %in% names(L)   0.004        1
> |>  lookup(100000, 100, 1.0)
> |                 test elapsed relative
> | 2         f(L, keys)   0.654    163.5
> | 3         g(L, keys)   0.004      1.0
> | 1 keys %in% names(L)   0.004      1.0
> |>  lookup(100000, 1, 0.1)
> |                 test elapsed relative
> | 2         f(L, keys)   0.647   161.75
> | 3         g(L, keys)   0.004     1.00
> | 1 keys %in% names(L)   0.005     1.25
> |>  lookup(100000, 1, 1.0)
> |                 test elapsed relative
> | 2         f(L, keys)   0.647   161.75
> | 3         g(L, keys)   0.004     1.00
> | 1 keys %in% names(L)   0.005     1.25
> |
> |
> | ## Code for benchmarks
> | library(inline)
> | library(rbenchmark)
> | f<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
> |     Rcpp::List lst(ls);
> |     Rcpp::CharacterVector nam = lst.names();
> |     std::vector<std::string>  nm = Rcpp::as<  std::vector<  std::string>  >
> | (nam);
> |     int m = nam.size();
> |
> |     Rcpp::CharacterVector tok(ts);
> |     std::vector<std::string>  tk = Rcpp::as<  std::vector<  std::string>  >
> | (tok);
> |     int n = tok.size();
> |
> |     Rcpp::LogicalVector   log(n);
> |
> |     for (int i=0; i<n; i++) {  // look at all tokens
> |        log[i] = false;                   // assume
> | false, but compare to all names
> |        for (int j=0; j<m; j++) {
> |                     if(tk[i] == nm[j])
> |               log[i] = 1;     // and note equality if we find
> | it
> |        }
> |     }
> |     return log;
> |  ')
> |
> | g<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
> |     Rcpp::List lst(ls);
> |     Rcpp::CharacterVector nam = lst.names();
> |
> |         Rcpp::IntegerVector ind(Rf_match(nam,ts,0));
> |     Rcpp::LogicalVector   log(ind);
> |         return log;
> |  ')
> |
> |
> | lookup<-function(nList, nKeys, P) {
> |     nList<-10000
> |     nKeys<-1000
> |     P<-0.1
> |     keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))
> |     L<-as.list(1:nList)
> |     names(L)<-as.character(1:nList)
> |     benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10,
> | columns=c("test", "elapsed", "relative"))
> | }
> |
> | lookup(10000, 1000, 0.1)
> | lookup(10000, 1000, 1.0)
> | lookup(100000, 100, 0.1)
> | lookup(100000, 100, 1.0)
> | lookup(100000, 1, 0.1)
> | lookup(100000, 1, 1.0)
> |
> |


More information about the Rcpp-devel mailing list