[Rcpp-devel] Performance question about DataFrame

Paul Johnson pauljohn32 at gmail.com
Fri Jan 18 17:16:03 CET 2013


On Thu, Jan 17, 2013 at 9:54 PM, John Merrill <john.merrill at gmail.com> wrote:
> As of 2.15.1, data.frame appears to no longer be O(n^2) in the number of
> columns in the frame.  That's certainly an improvement, yes.
>
> However, by eliminating calls to data.frame and replacing them with direct
> class modifications, I can take a routine which takes minutes and reduce it
> to a routine which takes seconds.  So, pragmatically, in Rcpp, I can get a
> rough factor of sixty, it appears.
>
>
Wow.

When you have this written out, will you post links to it?  I can
learn from your examples, I think.

pj



> On Thu, Jan 17, 2013 at 7:46 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>>
>> On Tue, Jan 15, 2013 at 9:20 AM, John Merrill <john.merrill at gmail.com>
>> wrote:
>> > It appears that DataFrame::create is a thin layer on top of the R
>> > data.frame
>> > call.  The guarantee correctness, but also means the performance of an
>> > Rcpp
>> > routine which returns a large data frame is limited by the performance
>> > of
>> > data.frame -- which is utterly horrible.
>>
>> Are you certain that this claim is still true?
>>
>> I was shocked/surprised by the package "dataframe" and the commentary
>> about it. The author said that data.frame was slow because "This
>> contains versions of standard data frame functions in R, modified to
>> avoid making extra copies of inputs. This is faster, particularly for
>> large data."
>>
>> it was repeatedly copying some objects and he proved a substantially
>> faster approach.
>>
>> In the release notes for R-2.15.1, I recall seeing a note that R Core
>> had responded by integrating several of those changes. But still
>> data.frame is not fast for you?
>>
>> If they didn't make the core data.frame as fast, would you care to
>> enlighten us by installing the dataframe package and letting us know
>> if it is still faster?
>>
>> Or perhaps you are way ahead of me and you've already imitated
>> Hesterberg's algorithms in your C++ design?
>>
>> pj
>>
>> --
>> Paul E. Johnson
>> Professor, Political Science      Assoc. Director
>> 1541 Lilac Lane, Room 504      Center for Research Methods
>> University of Kansas                 University of Kansas
>> http://pj.freefaculty.org               http://quant.ku.edu
>
>



-- 
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu


More information about the Rcpp-devel mailing list