[Rcpp-devel] Performance question about DataFrame

Romain Francois romain at r-enthusiasts.com
Thu Jan 31 12:21:42 CET 2013


Le 15/01/13 16:20, John Merrill a écrit :
> It appears that DataFrame::create is a thin layer on top of the R
> data.frame call.  The guarantee correctness, but also means the
> performance of an Rcpp routine which returns a large data frame is
> limited by the performance of data.frame -- which is utterly horrible.
>
> In the current version of R, there's a trivial, but borderline evil,
> work around: build a list of lists meeting the basic requirements of a
> data frame (they all need to be of the same length, and each component
> list needs to be named) and set the type of the object to "data.frame".
>
> I have two questions:
> (1) Is it reasonable to anticipate that this hack will continue to work
> for the near future in R?
> (2) If so, would a patch to that effect be of interest to the developers?

The reason we used a callback to data.frame is close to lazyness on our 
part. With the R function, for example we know that columns of different 
sizes will be handled properly, with recylcling, etc ...

Just making a named list of vectors is not enough. We have to make sure 
they all have the same length.

Perhaps it would be worth checking this and make better 
DataFrame::create functions.



Also, you can use a shortcut to assign row names, i.e. mimic this in C++ 
(the second line contains the magic):

 > d <- list( x = 1:10, y = 1:10 )
 > attr( d, "row.names" ) <- c( NA, -10L )
 > attr( d, "class" ) <- "data.frame"
 > d
     x  y
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10


Romain

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible



More information about the Rcpp-devel mailing list