[Rcpp-devel] Performance question about DataFrame
John Merrill
john.merrill at gmail.com
Thu Jan 31 16:20:33 CET 2013
I agree that this is not a complete implementation; it isn't meant to be,
although it might still be a worth incorporating this into Rcpp with the
appropriate fixes in place.
For instance, the vector recycling issue is far from the greatest
limitation of this code: it handles character vectors wrong, The R routine
converts character vectors into factors unless overridden; I wrote the
precursor of this particular routine because I wanted to handle strings
faithfully, and so writing a stupid R routine to coerce lists of lists of
constant length to data frames.
On Thu, Jan 31, 2013 at 3:21 AM, Romain Francois
<romain at r-enthusiasts.com>wrote:
> Le 15/01/13 16:20, John Merrill a écrit :
>
> It appears that DataFrame::create is a thin layer on top of the R
>> data.frame call. The guarantee correctness, but also means the
>> performance of an Rcpp routine which returns a large data frame is
>> limited by the performance of data.frame -- which is utterly horrible.
>>
>> In the current version of R, there's a trivial, but borderline evil,
>> work around: build a list of lists meeting the basic requirements of a
>> data frame (they all need to be of the same length, and each component
>> list needs to be named) and set the type of the object to "data.frame".
>>
>> I have two questions:
>> (1) Is it reasonable to anticipate that this hack will continue to work
>> for the near future in R?
>> (2) If so, would a patch to that effect be of interest to the developers?
>>
>
> The reason we used a callback to data.frame is close to lazyness on our
> part. With the R function, for example we know that columns of different
> sizes will be handled properly, with recylcling, etc ...
>
> Just making a named list of vectors is not enough. We have to make sure
> they all have the same length.
>
> Perhaps it would be worth checking this and make better DataFrame::create
> functions.
>
>
>
> Also, you can use a shortcut to assign row names, i.e. mimic this in C++
> (the second line contains the magic):
>
> > d <- list( x = 1:10, y = 1:10 )
> > attr( d, "row.names" ) <- c( NA, -10L )
> > attr( d, "class" ) <- "data.frame"
> > d
> x y
> 1 1 1
> 2 2 2
> 3 3 3
> 4 4 4
> 5 5 5
> 6 6 6
> 7 7 7
> 8 8 8
> 9 9 9
> 10 10 10
>
>
> Romain
>
> --
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
>
> R Graph Gallery: http://gallery.r-enthusiasts.**com<http://gallery.r-enthusiasts.com>
>
> blog: http://romainfrancois.blog.**free.fr<http://romainfrancois.blog.free.fr>
> |- http://bit.ly/RE6sYH : OOP with Rcpp modules
> `- http://bit.ly/Thw7IK : Rcpp modules more flexible
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130131/c7fac5c7/attachment-0001.html>
More information about the Rcpp-devel
mailing list