[Rcpp-devel] Performance question about DataFrame

Dirk Eddelbuettel edd at debian.org
Tue Jan 15 16:36:51 CET 2013


On 15 January 2013 at 07:20, John Merrill wrote:
| It appears that DataFrame::create is a thin layer on top of the R data.frame
| call.  The guarantee correctness, but also means the performance of an Rcpp
| routine which returns a large data frame is limited by the performance of
| data.frame -- which is utterly horrible.

All correct. It really mostly a convenience layer.  When we use R, we think
of data.frame objects as accessible by row -- which is not something we can
easily do at the C++ layer.  So the DataFrame class is really mostly a
wrapper around a list (as it is internally) with a call to R to set it.
 
| In the current version of R, there's a trivial, but borderline evil, work
| around: build a list of lists meeting the basic requirements of a data frame
| (they all need to be of the same length, and each component list needs to be
| named) and set the type of the object to "data.frame".   
| 
| I have two questions:
| (1) Is it reasonable to anticipate that this hack will continue to work for the
| near future in R?

We cannot speak for R Core.  

But this is so fundamental to so many things that I (personally speaking) am
inclined to say yes.

(Or did you mean Rcpp instead of R?  If so, example code?)

| (2) If so, would a patch to that effect be of interest to the developers?  

We are always open to reasonable patches to bring improvements (and come with
test cases demonstrating usefulness and a testing framework).  

As I recall there is also an open bug in our DataFrame right now, so if you want
to work on it, great :)

Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com  


More information about the Rcpp-devel mailing list