[Rcpp-devel] Efficient DataFrame access by row & column

Yan Zhou zhouyan at me.com
Wed Feb 20 00:09:52 CET 2013


The most inefficient part I see is the creation of a new NumericVector inside the inner most loop. You copied each column n times, of which n-1 times are unnecessary.

Yan Zhou

On Feb 19, 2013, at 11:02 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

> 
> Ken,
> 
> On 19 February 2013 at 22:35, Ken Williams wrote:
> | I have a need to loop through all the entries of a DataFrame by row, then
> | column.  I know two different ways:
> 
> There have been prior discussions of this topic, as well as example posts --
> even leading to a Rcpp Gallery article. Did you read any of these?  It wasn't
> clear from your post.
> 
> | I?m also curious why it?s a syntax error in Case A to just write `df[j][i]` or
> 
> Eeeek.  I prefer the more C++-y way of writing df(j,i).  Square brackets only
> work for vectors, and even then you may be better off with x(i) for
> consistency.
> 
> Overall, your premise may be wrong too.  "We all know" that a data.frame is
> not the fastest data structure in R, so by forcing ourselves to the same
> access are we not handycapping ourselves.
> 
> Once you are in C++, you can use whatever C++ datatype you like.  A
> data.frame really is just a list of vectors, each of the vectors has eg a
> begin(0 iterator which you can (fairly costlessly) instantiate STL types.
> 
> And those give you performance guarantees.
> 
> Hope this helps,  Dirk
> 
> -- 
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com  
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel



More information about the Rcpp-devel mailing list