[Rcpp-devel] Testing for existence of named components
Dirk Eddelbuettel
edd at debian.org
Wed Mar 28 17:07:56 CEST 2012
On 28 March 2012 at 16:36, Jonas Rauch wrote:
| If I remember correctly, environments use hash tables to lookup names.
|
| In pure R to check if an element L[["key"]] exists I would usually use
| > "key" %in% names(L)
| which seems to be reasonably fast. `%in%` works in vectors, too
| > keys %in% L
| returning exactly what Dirks suggestion was supposed to do (you missed an "if",
| or alternatively an "or")
|
| `%in%` uses .Internal(match(...)), which computes a temporary hash table as far
| as I can see in the source (src/main/unique.c). I am not sure about the
| implementation of the Hash. In particular I don't know if some kind of caching
| is done, because the lookup is much faster than Dirks function (see below) and
| computing a hash table for one lookup seems inefficient. I think it would be
| good to use the internal `match` function of R to do what you want. In
| Rinternals.h match is exposed as
| SEXP Rf_match(SEXP itable, SEXP ix, int nmatch)
| where nmatch is the value to be returned if there is no match.
Excellent suggestion!
I think we can even return just the index vector, with NA or another filler
for non-matches:
R> g <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
+ Rcpp::List lst(ls);
+ Rcpp::CharacterVector nam = lst.names();
+
+ Rcpp::IntegerVector ind(Rf_match(nam,ts,R_NaInt));
+ return ind;
+ ')
R> g(ll, c("aa", "b", "c"))
[1] NA 2 NA
R>
Dirk
| Below are the results of my little benchmark (see below). First parameter of
| `lookup` is the number elements in L, second is the length of keys, third one
| specifies the probability that a key is in present in the list.
|
| Regards,
| Jonas
|
| ##Benchmark results
| > lookup(10000, 1000, 0.1)
| test elapsed relative
| 2 f(L, keys) 0.641 160.25
| 3 g(L, keys) 0.005 1.25
| 1 keys %in% names(L) 0.004 1.00
| > lookup(10000, 1000, 1.0)
| test elapsed relative
| 2 f(L, keys) 0.643 160.75
| 3 g(L, keys) 0.004 1.00
| 1 keys %in% names(L) 0.004 1.00
| > lookup(100000, 100, 0.1)
| test elapsed relative
| 2 f(L, keys) 0.656 164
| 3 g(L, keys) 0.004 1
| 1 keys %in% names(L) 0.004 1
| > lookup(100000, 100, 1.0)
| test elapsed relative
| 2 f(L, keys) 0.654 163.5
| 3 g(L, keys) 0.004 1.0
| 1 keys %in% names(L) 0.004 1.0
| > lookup(100000, 1, 0.1)
| test elapsed relative
| 2 f(L, keys) 0.647 161.75
| 3 g(L, keys) 0.004 1.00
| 1 keys %in% names(L) 0.005 1.25
| > lookup(100000, 1, 1.0)
| test elapsed relative
| 2 f(L, keys) 0.647 161.75
| 3 g(L, keys) 0.004 1.00
| 1 keys %in% names(L) 0.005 1.25
|
|
| ## Code for benchmarks
| library(inline)
| library(rbenchmark)
| f <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
| Rcpp::List lst(ls);
| Rcpp::CharacterVector nam = lst.names();
| std::vector<std::string> nm = Rcpp::as< std::vector< std::string > >
| (nam);
| int m = nam.size();
|
| Rcpp::CharacterVector tok(ts);
| std::vector<std::string> tk = Rcpp::as< std::vector< std::string > >
| (tok);
| int n = tok.size();
|
| Rcpp::LogicalVector log(n);
|
| for (int i=0; i<n; i++) { // look at all tokens
| log[i] = false; // assume
| false, but compare to all names
| for (int j=0; j<m; j++) {
| if(tk[i] == nm[j])
| log[i] = 1; // and note equality if we find
| it
| }
| }
| return log;
| ')
|
| g <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='
| Rcpp::List lst(ls);
| Rcpp::CharacterVector nam = lst.names();
|
| Rcpp::IntegerVector ind(Rf_match(nam,ts,0));
| Rcpp::LogicalVector log(ind);
| return log;
| ')
|
|
| lookup<-function(nList, nKeys, P) {
| nList<-10000
| nKeys<-1000
| P<-0.1
| keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))
| L<-as.list(1:nList)
| names(L)<-as.character(1:nList)
| benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10,
| columns=c("test", "elapsed", "relative"))
| }
|
| lookup(10000, 1000, 0.1)
| lookup(10000, 1000, 1.0)
| lookup(100000, 100, 0.1)
| lookup(100000, 100, 1.0)
| lookup(100000, 1, 0.1)
| lookup(100000, 1, 1.0)
|
|
| On Wed, Mar 28, 2012 at 3:34 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
|
|
| On 28 March 2012 at 15:17, Ulrich Bodenhofer wrote:
| | Thanks for your swift reply, Dirk! Wow, to be frank, that is not what I
| | was expecting. In the meantime, I read Section 5.9.6 of "Writing R
| | extensions" and I was stunned to see a solution there that is similar to
| | the one you propose. I do not know R internals very well, but I cannot
| | believe that accesses by names are implemented by tediously searching
| | through all names sequentially. There must be some hash table behind,
| | right? Otherwise, list accesses would be terribly slow. If all my
|
| It's a trade-off: Inserting into a hash table has cost, you would enforce
| that on every (!!) vector use, whether names are looked up or not.
|
| Also, vectors are a natural representation as R's SEXP are just linear
| collections.
|
| | assumptions are right, I wonder why the R API does not make this
| | mechanism available.
| |
| | May I add one more question: where can I find the definition of the List
| | class and the implementation of its methods in the Rcpp package source
| | code? I was looking for this, but got lost. Are you using the method
|
| The only answer I can give is to ... keep looking. It's messy, because
| there
| is so much code---but that is because Rcpp does so much stuff.
|
| But some header files are more important than others -- and the Vector.h
| file
| as well as the Vector/ subdirectory are important.
|
| As mentioned, Rcpp::List() is a vector too.
|
| | described in Section 5.9.6 of "Writing R extensions" (which I doubt) or
| | something else. I would actually be quite curious to learn more about
| | how lists are implemented in Rcpp.
|
| They are just vectors, hence the linear traversal.
|
| Dirk
|
| | Thanks and best regards,
| | Ulrich
| |
| |
| | If so, I wonder why this mechanism has not been
| |
| | On 03/28/2012 02:56 PM, Dirk Eddelbuettel wrote:
| | > On 28 March 2012 at 13:56, Ulrich Bodenhofer wrote:
| | > | My question is the following: is there any way of checking in whether
| a
| | > | component of an Rcpp list (or vector) with a given name exists in
| this list. If
| | > | I simply try accessing a non-existing component, I get an "index out
| of bounds"
| | > | error. Trying to catch a possible exception did not work either. I
| also browsed
| | > | the Rcpp package source code, but unfortunately I got lost. Sorry if
| this has
| | > | been addressed on this list before. At least I googled in many
| different ways,
| | > | but could not find anything. Any ideas? Any help is gratefully
| appreciated!
| | >
| | > Good question, and I have the suspicion that we answered that years ago
| on
| | > the list before.
| | >
| | > Here is a super-pedestrian version. It takes a list, extracts its names
| () --
| | > and should probably test whether names is empty ? -- and then compares
| these
| | > against a vector of tokens, returning a bool for each token:
| | >
| | > R>
| | > R> suppressMessages(library(inline))
| | > R>
| | > R> f<- cxxfunction(signature(ls="list", ts="character"), plugin=
| "Rcpp", body='
| | > + Rcpp::List lst(ls);
| | > + Rcpp::CharacterVector nam = lst.names();
| | > + std::vector<std::string> nm = Rcpp::as< std::vector<
| std::string> >(nam);
| | > + int m = nam.size();
| | > +
| | > + Rcpp::CharacterVector tok(ts);
| | > + std::vector<std::string> tk = Rcpp::as< std::vector<
| std::string> >(tok);
| | > + int n = tok.size();
| | > +
| | > + Rcpp::LogicalVector log(n);
| | > +
| | > + for (int i=0; i<n; i++) { // look at all tokens
| | > + log[i] = false; // assume false,
| but compare to all names
| | > + for (int j=0; j<m; j++) {
| | > + log[i] = (tk[i] == nm[j]); // and note equality
| if we find it
| | > + }
| | > + }
| | > + return log;
| | > + ')
| | > R> ll<- list(aa=1,b="b") ## list with names 'aa' and 'b'
| | > R> f(ll, c("aa", "b", "c")) ## does it contain 'a' or 'b'
| or 'c' ?
| | > [1] FALSE TRUE FALSE
| | > R> f(ll, "c") ## works with scalars
| too
| | > [1] FALSE
| | > R>
| | >
| | > I am sure there is a much cleverer solution one could write...
| | >
| | > Dirk
| | >
| |
|
| --
| R/Finance 2012 Conference on May 11 and 12, 2012 at UIC in Chicago, IL
| See agenda, registration details and more at http://www.RinFinance.com
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
|
|
|
| ----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
R/Finance 2012 Conference on May 11 and 12, 2012 at UIC in Chicago, IL
See agenda, registration details and more at http://www.RinFinance.com
More information about the Rcpp-devel
mailing list