[Rcpp-devel] Efficient DataFrame access by row & column

Ken Williams Ken.Williams at windlogics.com
Wed Feb 20 00:24:26 CET 2013



> -----Original Message-----
> From: Dirk Eddelbuettel [mailto:edd at debian.org]
> Sent: Tuesday, February 19, 2013 5:02 PM
> To: Ken Williams
> Cc: rcpp-devel at lists.r-forge.r-project.org
> Subject: Re: [Rcpp-devel] Efficient DataFrame access by row & column
>
>
> Ken,
>
> On 19 February 2013 at 22:35, Ken Williams wrote:
> | I have a need to loop through all the entries of a DataFrame by row,
> | then column.  I know two different ways:
>
> There have been prior discussions of this topic, as well as example posts --
> even leading to a Rcpp Gallery article. Did you read any of these?  It wasn't
> clear from your post.

I looked, but I didn't find anything directly addressing it.  Most of what I found at http://search.gmane.org/?query=dataframe&group=gmane.comp.lang.r.rcpp seems to deal with creating DataFrame objects, not indexing into them.

In the Rcpp Gallery, I also see 2 articles on creating/modifying DataFrame objects, but nothing demonstrating any indexing differently than I wrote.

The other place I looked was inst/unitTests/cpp/DataFrame.cpp in the repository.

If I missed something relevant, I'd be happy to be pointed to it.


>
> | I?m also curious why it?s a syntax error in Case A to just write
> | `df[j][i]` or
>
> Eeeek.  I prefer the more C++-y way of writing df(j,i).

Attempting to do so, I get a compile-time error:

window.cpp:68:34: error: ambiguous overload for 'operator-' in 'Rcpp::Vector<RTYPE>::operator()(const size_t&, const size_t&) [with int RTYPE = 19, Rcpp::Vector<RTYPE>::Proxy = Rcpp::internal::generic_proxy<19>, size_t = long long unsigned int]((* &((size_t)j)), (* &((size_t)i))) - Rcpp::Vector<RTYPE>::operator()(const size_t&, const size_t&) [with int RTYPE = 19, Rcpp::Vector<RTYPE>::Proxy = Rcpp::internal::generic_proxy<19>, size_t = long long unsigned int]((* &((size_t)j)), (* &((size_t)last_i)))'
window.cpp:68:34: note: candidates are:
window.cpp:68:34: note: operator-(SEXP, SEXP) <built-in>
window.cpp:68:34: note: operator-(SEXP, long long int) <built-in>
window.cpp:68:34: note: operator-(int, int) <built-in>

For context, the line that's failing is:

     if(fabs(df(j,i)-df(j,last_i))>thresh) {


> Overall, your premise may be wrong too.  "We all know" that a data.frame is
> not the fastest data structure in R, so by forcing ourselves to the same access
> are we not handycapping ourselves.

I was operating under the premise that there "must be" a constant-time accessor for a List element (DataFrame column), and once I have that, a constant-time accessor for an element of that vector.  I know the latter is true, but is the former not true?  I assumed it was but that I just couldn't find it.

 -Ken

________________________________

CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of any kind is strictly prohibited. If you are not the intended recipient, please contact the sender via reply e-mail and destroy all copies of the original message. Thank you.


More information about the Rcpp-devel mailing list