[Rcpp-devel] Performance question about DataFrame
Romain Francois
romain at r-enthusiasts.com
Thu Jan 31 12:21:42 CET 2013
Le 15/01/13 16:20, John Merrill a écrit :
> It appears that DataFrame::create is a thin layer on top of the R
> data.frame call. The guarantee correctness, but also means the
> performance of an Rcpp routine which returns a large data frame is
> limited by the performance of data.frame -- which is utterly horrible.
>
> In the current version of R, there's a trivial, but borderline evil,
> work around: build a list of lists meeting the basic requirements of a
> data frame (they all need to be of the same length, and each component
> list needs to be named) and set the type of the object to "data.frame".
>
> I have two questions:
> (1) Is it reasonable to anticipate that this hack will continue to work
> for the near future in R?
> (2) If so, would a patch to that effect be of interest to the developers?
The reason we used a callback to data.frame is close to lazyness on our
part. With the R function, for example we know that columns of different
sizes will be handled properly, with recylcling, etc ...
Just making a named list of vectors is not enough. We have to make sure
they all have the same length.
Perhaps it would be worth checking this and make better
DataFrame::create functions.
Also, you can use a shortcut to assign row names, i.e. mimic this in C++
(the second line contains the magic):
> d <- list( x = 1:10, y = 1:10 )
> attr( d, "row.names" ) <- c( NA, -10L )
> attr( d, "class" ) <- "data.frame"
> d
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Romain
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
R Graph Gallery: http://gallery.r-enthusiasts.com
blog: http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible
More information about the Rcpp-devel
mailing list