[Rcpp-devel] Testing for existence of named components

Jonas Rauch jonas.rauch at googlemail.com
Wed Mar 28 16:36:44 CEST 2012


If I remember correctly, environments use hash tables to lookup names.

In pure R to check if an element L[["key"]] exists I would usually use
> "key" %in% names(L)
which seems to be reasonably fast. `%in%` works in vectors, too
> keys %in% L
returning exactly what Dirks suggestion was supposed to do (you missed an
"if", or alternatively an "or")

`%in%` uses .Internal(match(...)), which computes a temporary hash table as
far as I can see in the source (src/main/unique.c). I am not sure about the
implementation of the Hash. In particular I don't know if some kind of
caching is done, because the lookup is much faster than Dirks function (see
below) and computing a hash table for one lookup seems inefficient. I think
it would be good to use the internal `match` function of R to do what you
want. In Rinternals.h match is exposed as
  SEXP Rf_match(SEXP itable, SEXP ix, int nmatch)
where nmatch is the value to be returned if there is no match.

Below are the results of my little benchmark (see below). First parameter
of `lookup` is the number elements in L, second is the length of keys,
third one specifies the probability that a key is in present in the list.

Regards,
Jonas

##Benchmark results
> lookup(10000, 1000, 0.1)
                test elapsed relative
2         f(L, keys)   0.641   160.25
3         g(L, keys)   0.005     1.25
1 keys %in% names(L)   0.004     1.00
> lookup(10000, 1000, 1.0)
                test elapsed relative
2         f(L, keys)   0.643   160.75
3         g(L, keys)   0.004     1.00
1 keys %in% names(L)   0.004     1.00
> lookup(100000, 100, 0.1)
                test elapsed relative
2         f(L, keys)   0.656      164
3         g(L, keys)   0.004        1
1 keys %in% names(L)   0.004        1
> lookup(100000, 100, 1.0)
                test elapsed relative
2         f(L, keys)   0.654    163.5
3         g(L, keys)   0.004      1.0
1 keys %in% names(L)   0.004      1.0
> lookup(100000, 1, 0.1)
                test elapsed relative
2         f(L, keys)   0.647   161.75
3         g(L, keys)   0.004     1.00
1 keys %in% names(L)   0.005     1.25
> lookup(100000, 1, 1.0)
                test elapsed relative
2         f(L, keys)   0.647   161.75
3         g(L, keys)   0.004     1.00
1 keys %in% names(L)   0.005     1.25


## Code for benchmarks
library(inline)
library(rbenchmark)
f <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
    Rcpp::List lst(ls);
    Rcpp::CharacterVector nam = lst.names();
    std::vector<std::string> nm = Rcpp::as< std::vector< std::string >
>(nam);
    int m = nam.size();

    Rcpp::CharacterVector tok(ts);
    std::vector<std::string> tk = Rcpp::as< std::vector< std::string >
>(tok);
    int n = tok.size();

    Rcpp::LogicalVector   log(n);

    for (int i=0; i<n; i++) {  // look at all tokens
       log[i] = false;                   // assume false, but compare to
all names
       for (int j=0; j<m; j++) {
                    if(tk[i] == nm[j])
              log[i] = 1;     // and note equality if we find it
       }
    }
    return log;
 ')

g <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
    Rcpp::List lst(ls);
    Rcpp::CharacterVector nam = lst.names();

        Rcpp::IntegerVector ind(Rf_match(nam,ts,0));
    Rcpp::LogicalVector   log(ind);
        return log;
 ')


lookup<-function(nList, nKeys, P) {
    nList<-10000
    nKeys<-1000
    P<-0.1
    keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))
    L<-as.list(1:nList)
    names(L)<-as.character(1:nList)
    benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10,
columns=c("test", "elapsed", "relative"))
}

lookup(10000, 1000, 0.1)
lookup(10000, 1000, 1.0)
lookup(100000, 100, 0.1)
lookup(100000, 100, 1.0)
lookup(100000, 1, 0.1)
lookup(100000, 1, 1.0)


On Wed, Mar 28, 2012 at 3:34 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

>
> On 28 March 2012 at 15:17, Ulrich Bodenhofer wrote:
> | Thanks for your swift reply, Dirk! Wow, to be frank, that is not what I
> | was expecting. In the meantime, I read Section 5.9.6 of "Writing R
> | extensions" and I was stunned to see a solution there that is similar to
> | the one you propose. I do not know R internals very well, but I cannot
> | believe that accesses by names are implemented by tediously searching
> | through all names sequentially. There must be some hash table behind,
> | right? Otherwise, list accesses would be terribly slow. If all my
>
> It's a trade-off: Inserting into a hash table has cost, you would enforce
> that on every (!!) vector use, whether names are looked up or not.
>
> Also, vectors are a natural representation as R's SEXP are just linear
> collections.
>
> | assumptions are right, I wonder why the R API does not make this
> | mechanism available.
> |
> | May I add one more question: where can I find the definition of the List
> | class and the implementation of its methods in the Rcpp package source
> | code? I was looking for this, but got lost. Are you using the method
>
> The only answer I can give is to ... keep looking.  It's messy, because
> there
> is so much code---but that is because Rcpp does so much stuff.
>
> But some header files are more important than others -- and the Vector.h
> file
> as well as the Vector/ subdirectory are important.
>
> As mentioned, Rcpp::List() is a vector too.
>
> | described in Section 5.9.6 of "Writing R extensions" (which I doubt) or
> | something else. I would actually be quite curious to learn more about
> | how lists are implemented in Rcpp.
>
> They are just vectors, hence the linear traversal.
>
> Dirk
>
> | Thanks and best regards,
> | Ulrich
> |
> |
> | If so, I wonder why this mechanism has not been
> |
> | On 03/28/2012 02:56 PM, Dirk Eddelbuettel wrote:
> | > On 28 March 2012 at 13:56, Ulrich Bodenhofer wrote:
> | > | My question is the following: is there any way of checking in
> whether a
> | > | component of an Rcpp list (or vector) with a given name exists in
> this list. If
> | > | I simply try accessing a non-existing component, I get an "index out
> of bounds"
> | > | error. Trying to catch a possible exception did not work either. I
> also browsed
> | > | the Rcpp package source code, but unfortunately I got lost. Sorry if
> this has
> | > | been addressed on this list before. At least I googled in many
> different ways,
> | > | but could not find anything. Any ideas? Any help is gratefully
> appreciated!
> | >
> | > Good question, and I have the suspicion that we answered that years
> ago on
> | > the list before.
> | >
> | > Here is a super-pedestrian version. It takes a list, extracts its
> names() --
> | > and should probably test whether names is empty ? -- and then compares
> these
> | > against a vector of tokens, returning a bool for each token:
> | >
> | > R>
> | > R>  suppressMessages(library(inline))
> | > R>
> | > R>  f<- cxxfunction(signature(ls="list", ts="character"),
> plugin="Rcpp", body='
> | > +    Rcpp::List lst(ls);
> | > +    Rcpp::CharacterVector nam = lst.names();
> | > +    std::vector<std::string>  nm = Rcpp::as<  std::vector<
>  std::string>  >(nam);
> | > +    int m = nam.size();
> | > +
> | > +    Rcpp::CharacterVector tok(ts);
> | > +    std::vector<std::string>  tk = Rcpp::as<  std::vector<
>  std::string>  >(tok);
> | > +    int n = tok.size();
> | > +
> | > +    Rcpp::LogicalVector   log(n);
> | > +
> | > +    for (int i=0; i<n; i++) {  // look at all tokens
> | > +       log[i] = false;                   // assume false, but compare
> to all names
> | > +       for (int j=0; j<m; j++) {
> | > +          log[i] = (tk[i] == nm[j]);     // and note equality if we
> find it
> | > +       }
> | > +    }
> | > +    return log;
> | > + ')
> | > R>  ll<- list(aa=1,b="b")       ## list with names 'aa' and 'b'
> | > R>  f(ll, c("aa", "b", "c"))         ## does it contain 'a' or 'b' or
> 'c' ?
> | > [1] FALSE  TRUE FALSE
> | > R>  f(ll, "c")                       ## works with scalars too
> | > [1] FALSE
> | > R>
> | >
> | > I am sure there is a much cleverer solution one could write...
> | >
> | > Dirk
> | >
> |
>
> --
> R/Finance 2012 Conference on May 11 and 12, 2012 at UIC in Chicago, IL
> See agenda, registration details and more at http://www.RinFinance.com
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120328/2cd3908d/attachment-0001.html>


More information about the Rcpp-devel mailing list