[Rcpp-devel] Efficient DataFrame access by row & column

Ken Williams Ken.Williams at windlogics.com
Tue Feb 19 23:35:22 CET 2013


Hi,

I have a need to loop through all the entries of a DataFrame by row, then column.  I know two different ways:

  // Case A: When df.length() is unknown at coding time:
  int n = df.nrows();
  int m = df.length();
  for(int i=1; i<n; i++) {
    for(int j=0; j<m; j++) {
      NumericVector v = df[j];
      // ... do stuff with v[i] ...
    }
  }

  // Case B: If I know the number of columns while writing the C code:
  int n = df.nrows();
  NumericVector xs = df[0];
  NumericVector ys = df[1];
  for(int i=1; i<n; i++) {
    // ... do stuff with xs[i] and ys[i] ...
  }

The second way is less flexible, but it's also quite a bit faster in practice - I presume this means the "NumericVector ..." expressions are doing a non-trivial amount of work (perhaps even copying the whole vector?).

Is there a way to have my cake & eat it?  Can I efficiently (O[1]) index into a DataFrame by numeric row index and numeric column index?

I'm also curious why it's a syntax error in Case A to just write `df[j][i]` or even `((NumericVector) df[j])[i]`  - clearly there's magic behind the "NumericVector" call that I don't understand.

Thanks.

--
Ken Williams, Senior Research Scientist
WindLogics
http://windlogics.com


________________________________

CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of any kind is strictly prohibited. If you are not the intended recipient, please contact the sender via reply e-mail and destroy all copies of the original message. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130219/88de3a39/attachment.html>


More information about the Rcpp-devel mailing list