[Rcpp-devel] Testing for existence of named components
Ulrich Bodenhofer
bodenhofer at bioinf.jku.at
Thu Mar 29 09:14:19 CEST 2012
Thanks so much, Jonas and Dirk! I just tried it and it works great.
Jonas' benchmarking results are awesome too. I hope our discussion will
help many other people who will come across this issue some time.
Cheers, Ulrich
On 03/28/2012 05:07 PM, Dirk Eddelbuettel wrote:
> On 28 March 2012 at 16:36, Jonas Rauch wrote:
> | If I remember correctly, environments use hash tables to lookup names.
> |
> | In pure R to check if an element L[["key"]] exists I would usually use
> |> "key" %in% names(L)
> | which seems to be reasonably fast. `%in%` works in vectors, too
> |> keys %in% L
> | returning exactly what Dirks suggestion was supposed to do (you missed an "if",
> | or alternatively an "or")
> |
> | `%in%` uses .Internal(match(...)), which computes a temporary hash table as far
> | as I can see in the source (src/main/unique.c). I am not sure about the
> | implementation of the Hash. In particular I don't know if some kind of caching
> | is done, because the lookup is much faster than Dirks function (see below) and
> | computing a hash table for one lookup seems inefficient. I think it would be
> | good to use the internal `match` function of R to do what you want. In
> | Rinternals.h match is exposed as
> | SEXP Rf_match(SEXP itable, SEXP ix, int nmatch)
> | where nmatch is the value to be returned if there is no match.
>
> Excellent suggestion!
>
> I think we can even return just the index vector, with NA or another filler
> for non-matches:
>
> R> g<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
> + Rcpp::List lst(ls);
> + Rcpp::CharacterVector nam = lst.names();
> +
> + Rcpp::IntegerVector ind(Rf_match(nam,ts,R_NaInt));
> + return ind;
> + ')
> R> g(ll, c("aa", "b", "c"))
> [1] NA 2 NA
> R>
>
> Dirk
>
> | Below are the results of my little benchmark (see below). First parameter of
> | `lookup` is the number elements in L, second is the length of keys, third one
> | specifies the probability that a key is in present in the list.
> |
> | Regards,
> | Jonas
> |
> | ##Benchmark results
> |> lookup(10000, 1000, 0.1)
> | test elapsed relative
> | 2 f(L, keys) 0.641 160.25
> | 3 g(L, keys) 0.005 1.25
> | 1 keys %in% names(L) 0.004 1.00
> |> lookup(10000, 1000, 1.0)
> | test elapsed relative
> | 2 f(L, keys) 0.643 160.75
> | 3 g(L, keys) 0.004 1.00
> | 1 keys %in% names(L) 0.004 1.00
> |> lookup(100000, 100, 0.1)
> | test elapsed relative
> | 2 f(L, keys) 0.656 164
> | 3 g(L, keys) 0.004 1
> | 1 keys %in% names(L) 0.004 1
> |> lookup(100000, 100, 1.0)
> | test elapsed relative
> | 2 f(L, keys) 0.654 163.5
> | 3 g(L, keys) 0.004 1.0
> | 1 keys %in% names(L) 0.004 1.0
> |> lookup(100000, 1, 0.1)
> | test elapsed relative
> | 2 f(L, keys) 0.647 161.75
> | 3 g(L, keys) 0.004 1.00
> | 1 keys %in% names(L) 0.005 1.25
> |> lookup(100000, 1, 1.0)
> | test elapsed relative
> | 2 f(L, keys) 0.647 161.75
> | 3 g(L, keys) 0.004 1.00
> | 1 keys %in% names(L) 0.005 1.25
> |
> |
> | ## Code for benchmarks
> | library(inline)
> | library(rbenchmark)
> | f<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
> | Rcpp::List lst(ls);
> | Rcpp::CharacterVector nam = lst.names();
> | std::vector<std::string> nm = Rcpp::as< std::vector< std::string> >
> | (nam);
> | int m = nam.size();
> |
> | Rcpp::CharacterVector tok(ts);
> | std::vector<std::string> tk = Rcpp::as< std::vector< std::string> >
> | (tok);
> | int n = tok.size();
> |
> | Rcpp::LogicalVector log(n);
> |
> | for (int i=0; i<n; i++) { // look at all tokens
> | log[i] = false; // assume
> | false, but compare to all names
> | for (int j=0; j<m; j++) {
> | if(tk[i] == nm[j])
> | log[i] = 1; // and note equality if we find
> | it
> | }
> | }
> | return log;
> | ')
> |
> | g<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
> | Rcpp::List lst(ls);
> | Rcpp::CharacterVector nam = lst.names();
> |
> | Rcpp::IntegerVector ind(Rf_match(nam,ts,0));
> | Rcpp::LogicalVector log(ind);
> | return log;
> | ')
> |
> |
> | lookup<-function(nList, nKeys, P) {
> | nList<-10000
> | nKeys<-1000
> | P<-0.1
> | keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))
> | L<-as.list(1:nList)
> | names(L)<-as.character(1:nList)
> | benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10,
> | columns=c("test", "elapsed", "relative"))
> | }
> |
> | lookup(10000, 1000, 0.1)
> | lookup(10000, 1000, 1.0)
> | lookup(100000, 100, 0.1)
> | lookup(100000, 100, 1.0)
> | lookup(100000, 1, 0.1)
> | lookup(100000, 1, 1.0)
> |
> |
More information about the Rcpp-devel
mailing list