[Rcpp-devel] Performance question about DataFrame
Dirk Eddelbuettel
edd at debian.org
Tue Jan 15 16:36:51 CET 2013
On 15 January 2013 at 07:20, John Merrill wrote:
| It appears that DataFrame::create is a thin layer on top of the R data.frame
| call. The guarantee correctness, but also means the performance of an Rcpp
| routine which returns a large data frame is limited by the performance of
| data.frame -- which is utterly horrible.
All correct. It really mostly a convenience layer. When we use R, we think
of data.frame objects as accessible by row -- which is not something we can
easily do at the C++ layer. So the DataFrame class is really mostly a
wrapper around a list (as it is internally) with a call to R to set it.
| In the current version of R, there's a trivial, but borderline evil, work
| around: build a list of lists meeting the basic requirements of a data frame
| (they all need to be of the same length, and each component list needs to be
| named) and set the type of the object to "data.frame".
|
| I have two questions:
| (1) Is it reasonable to anticipate that this hack will continue to work for the
| near future in R?
We cannot speak for R Core.
But this is so fundamental to so many things that I (personally speaking) am
inclined to say yes.
(Or did you mean Rcpp instead of R? If so, example code?)
| (2) If so, would a patch to that effect be of interest to the developers?
We are always open to reasonable patches to bring improvements (and come with
test cases demonstrating usefulness and a testing framework).
As I recall there is also an open bug in our DataFrame right now, so if you want
to work on it, great :)
Dirk
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
More information about the Rcpp-devel
mailing list