If I remember correctly, environments use hash tables to lookup names. <br><br>In pure R to check if an element L[["key"]] exists I would usually use <br>> "key" %in% names(L)<br>which seems to be reasonably fast. `%in%` works in vectors, too<br>
> keys %in% L<br>returning exactly what Dirks suggestion was supposed to do (you missed an "if", or alternatively an "or")<br><br>`%in%`
uses .Internal(match(...)), which computes a temporary hash table as
far as I can see in the source (src/main/unique.c). I am not sure about
the implementation of the Hash. In particular I don't know if some kind
of caching is done, because the lookup is much faster than Dirks
function (see below) and computing a hash table for one lookup seems
inefficient. I think it would be good to use the internal `match`
function of R to do what you want. In Rinternals.h match is exposed as <br> SEXP Rf_match(SEXP itable, SEXP ix, int nmatch)<br>where nmatch is the value to be returned if there is no match. <br><br>
Below are the results of my little benchmark (see below). First parameter
of `lookup` is the number elements in L, second is the length of keys,
third one specifies the probability that a key is in present in the list.<br>
<br>
Regards,<br>
Jonas<br>
<br>##Benchmark results<br>> lookup(10000, 1000, 0.1)<br> test elapsed relative<br>2 f(L, keys) 0.641 160.25<br>3 g(L, keys) 0.005 1.25<br>1 keys %in% names(L) 0.004 1.00<br>
> lookup(10000, 1000, 1.0)<br> test elapsed relative<br>2 f(L, keys) 0.643 160.75<br>3 g(L, keys) 0.004 1.00<br>1 keys %in% names(L) 0.004 1.00<br>> lookup(100000, 100, 0.1)<br>
test elapsed relative<br>2 f(L, keys) 0.656 164<br>3 g(L, keys) 0.004 1<br>1 keys %in% names(L) 0.004 1<br>> lookup(100000, 100, 1.0)<br> test elapsed relative<br>
2 f(L, keys) 0.654 163.5<br>3 g(L, keys) 0.004 1.0<br>1 keys %in% names(L) 0.004 1.0<br>> lookup(100000, 1, 0.1)<br> test elapsed relative<br>2 f(L, keys) 0.647 161.75<br>
3 g(L, keys) 0.004 1.00<br>1 keys %in% names(L) 0.005 1.25<br>> lookup(100000, 1, 1.0)<br> test elapsed relative<br>2 f(L, keys) 0.647 161.75<br>3 g(L, keys) 0.004 1.00<br>
1 keys %in% names(L) 0.005 1.25<br><br>
<br>## Code for benchmarks<br>library(inline)<br>library(rbenchmark)<br>f <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='<br> Rcpp::List lst(ls);<br> Rcpp::CharacterVector nam = lst.names();<br>
std::vector<std::string> nm = Rcpp::as< std::vector< std::string > >(nam);<br> int m = nam.size();<br><br> Rcpp::CharacterVector tok(ts);<br> std::vector<std::string> tk = Rcpp::as< std::vector< std::string > >(tok);<br>
int n = tok.size();<br><br> Rcpp::LogicalVector log(n);<br><br> for (int i=0; i<n; i++) { // look at all tokens<br> log[i] = false; // assume false, but compare to all names<br> for (int j=0; j<m; j++) {<br>
if(tk[i] == nm[j])<br> log[i] = 1; // and note equality if we find it<br> }<br> }<br> return log;<br> ')<br><br>g <- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='<br>
Rcpp::List lst(ls);<br> Rcpp::CharacterVector nam = lst.names();<br><br> Rcpp::IntegerVector ind(Rf_match(nam,ts,0));<br> Rcpp::LogicalVector log(ind);<br> return log;<br> ')<br><br><br>lookup<-function(nList, nKeys, P) {<br>
nList<-10000<br> nKeys<-1000<br> P<-0.1<br> keys<-as.character(floor(runif(nKeys, 1, nList/P+1)))<br> L<-as.list(1:nList)<br> names(L)<-as.character(1:nList)<br> benchmark(keys %in% names(L), f(L, keys), g(L, keys), replications=10, columns=c("test", "elapsed", "relative"))<br>
}<br><br>lookup(10000, 1000, 0.1)<br>lookup(10000, 1000, 1.0)<br>lookup(100000, 100, 0.1)<br>lookup(100000, 100, 1.0)<br>lookup(100000, 1, 0.1)<br>lookup(100000, 1, 1.0)<br><br><br><div class="gmail_quote">On Wed, Mar 28, 2012 at 3:34 PM, Dirk Eddelbuettel <span dir="ltr"><<a href="mailto:edd@debian.org">edd@debian.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><br>
On 28 March 2012 at 15:17, Ulrich Bodenhofer wrote:<br>
| Thanks for your swift reply, Dirk! Wow, to be frank, that is not what I<br>
| was expecting. In the meantime, I read Section 5.9.6 of "Writing R<br>
| extensions" and I was stunned to see a solution there that is similar to<br>
| the one you propose. I do not know R internals very well, but I cannot<br>
| believe that accesses by names are implemented by tediously searching<br>
| through all names sequentially. There must be some hash table behind,<br>
| right? Otherwise, list accesses would be terribly slow. If all my<br>
<br>
</div>It's a trade-off: Inserting into a hash table has cost, you would enforce<br>
that on every (!!) vector use, whether names are looked up or not.<br>
<br>
Also, vectors are a natural representation as R's SEXP are just linear<br>
collections.<br>
<div class="im"><br>
| assumptions are right, I wonder why the R API does not make this<br>
| mechanism available.<br>
|<br>
| May I add one more question: where can I find the definition of the List<br>
| class and the implementation of its methods in the Rcpp package source<br>
| code? I was looking for this, but got lost. Are you using the method<br>
<br>
</div>The only answer I can give is to ... keep looking. It's messy, because there<br>
is so much code---but that is because Rcpp does so much stuff.<br>
<br>
But some header files are more important than others -- and the Vector.h file<br>
as well as the Vector/ subdirectory are important.<br>
<br>
As mentioned, Rcpp::List() is a vector too.<br>
<div class="im"><br>
| described in Section 5.9.6 of "Writing R extensions" (which I doubt) or<br>
| something else. I would actually be quite curious to learn more about<br>
| how lists are implemented in Rcpp.<br>
<br>
</div>They are just vectors, hence the linear traversal.<br>
<span class="HOEnZb"><font color="#888888"><br>
Dirk<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
| Thanks and best regards,<br>
| Ulrich<br>
|<br>
|<br>
| If so, I wonder why this mechanism has not been<br>
|<br>
| On 03/28/2012 02:56 PM, Dirk Eddelbuettel wrote:<br>
| > On 28 March 2012 at 13:56, Ulrich Bodenhofer wrote:<br>
| > | My question is the following: is there any way of checking in whether a<br>
| > | component of an Rcpp list (or vector) with a given name exists in this list. If<br>
| > | I simply try accessing a non-existing component, I get an "index out of bounds"<br>
| > | error. Trying to catch a possible exception did not work either. I also browsed<br>
| > | the Rcpp package source code, but unfortunately I got lost. Sorry if this has<br>
| > | been addressed on this list before. At least I googled in many different ways,<br>
| > | but could not find anything. Any ideas? Any help is gratefully appreciated!<br>
| ><br>
| > Good question, and I have the suspicion that we answered that years ago on<br>
| > the list before.<br>
| ><br>
| > Here is a super-pedestrian version. It takes a list, extracts its names() --<br>
| > and should probably test whether names is empty ? -- and then compares these<br>
| > against a vector of tokens, returning a bool for each token:<br>
| ><br>
| > R><br>
| > R> suppressMessages(library(inline))<br>
| > R><br>
| > R> f<- cxxfunction(signature(ls="list", ts="character"), plugin="Rcpp", body='<br>
| > + Rcpp::List lst(ls);<br>
| > + Rcpp::CharacterVector nam = lst.names();<br>
| > + std::vector<std::string> nm = Rcpp::as< std::vector< std::string> >(nam);<br>
| > + int m = nam.size();<br>
| > +<br>
| > + Rcpp::CharacterVector tok(ts);<br>
| > + std::vector<std::string> tk = Rcpp::as< std::vector< std::string> >(tok);<br>
| > + int n = tok.size();<br>
| > +<br>
| > + Rcpp::LogicalVector log(n);<br>
| > +<br>
| > + for (int i=0; i<n; i++) { // look at all tokens<br>
| > + log[i] = false; // assume false, but compare to all names<br>
| > + for (int j=0; j<m; j++) {<br>
| > + log[i] = (tk[i] == nm[j]); // and note equality if we find it<br>
| > + }<br>
| > + }<br>
| > + return log;<br>
| > + ')<br>
| > R> ll<- list(aa=1,b="b") ## list with names 'aa' and 'b'<br>
| > R> f(ll, c("aa", "b", "c")) ## does it contain 'a' or 'b' or 'c' ?<br>
| > [1] FALSE TRUE FALSE<br>
| > R> f(ll, "c") ## works with scalars too<br>
| > [1] FALSE<br>
| > R><br>
| ><br>
| > I am sure there is a much cleverer solution one could write...<br>
| ><br>
| > Dirk<br>
| ><br>
|<br>
<br>
</div></div><div class="im HOEnZb">--<br>
R/Finance 2012 Conference on May 11 and 12, 2012 at UIC in Chicago, IL<br>
See agenda, registration details and more at <a href="http://www.RinFinance.com" target="_blank">http://www.RinFinance.com</a><br>
</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>
Rcpp-devel mailing list<br>
<a href="mailto:Rcpp-devel@lists.r-forge.r-project.org">Rcpp-devel@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a><br>
</div></div></blockquote></div><br>