[Rcpp-devel] Speed up of the data.frame creation in DataFrame.h

Dmitry Nesterov dmitry.nesterov at gmail.com
Sun Jun 8 21:55:26 CEST 2014


Christian,
The change does break the unit tests, and after I have studied a little more how the classes are structured, I have realized that this change by itself is not enough. At this point “as.data.frame” conversion has to be called, as the internal structure of the DataFrame class seems to be more like a regular list and cannot be simply classified as “data.frame” as I originally thought.

As for the limited use function, that could be a good idea, say in the cases when all the vectors in data frame are of the same length (as often is the case if you deal with real-life data) and simply throw exception if they are different.

But since I realized my solution is incomplete and ultimately incorrect, I’d like to investigate a little more and see if I can do it correctly and check if the benchmark would improve in that case.

Best,
Dmitry
On Jun 8, 2014, at 3:33 PM, Christian Gunning <xian at unm.edu> wrote:

>> Subject: Re: [Rcpp-devel] Speed up of the data.frame creation in
>>        DataFrame.h
>> Message-ID: <757EF798-6BC6-4150-93CD-B5F23D9014C7 at gmail.com>
>> Content-Type: text/plain; charset=windows-1252
>> 
>> The fix I was proposing actually has implications much deeper than I thought. I would need to investigate further and will take no action at this time.
> 
> Hello list, long time no see!
> 
> Dmitry,
> Have you identified any other consequences than what Romain pointed
> out?  This information would be useful for the rest of us.
> 
> Some key points that I agree with:
> * as per Dirk: this is a nice little piece of sleuthing.  Your
> benchmarking shows that the effect is significant.
> * as per your comments: a key intent of Rcpp is allow the user the
> freedom to acheive optimization and do their own error checking.
>  * as per Romain: let's not break things.
> 
> It seems possible address all of these points, perhaps with a
> dedicated function, as per your comments.  I can help with this, if
> you're interested.
> 
> Key question: what is the intended behavior of this function?  E.g.,
> throw an exception on length mismatch?  My vote is for a limited
> function that deals with a limited number of use cases and provides
> reasonable error-checking (e.g. throws exception for input outside
> scope), versus a logic-heavy function that handles recycling, for
> example.  Does this match your use-case?
> 
> -Christian
> 
> -- 
> A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!



More information about the Rcpp-devel mailing list