[Rcpp-devel] Performance question about DataFrame

John Merrill john.merrill at gmail.com
Sun Feb 17 19:00:42 CET 2013


I'm sorry, I was unclear.  I wasn't referring to the use of sprintf, which
is, as you say, a bug (albeit, in my own defense, one I was aware of and
left in place for clarity.  In production code, I'd have checked for the
number of rows at the top of the function and failed if there were too many
of them.)  I was referring to the in parameters.  For some reason, when
large List parameters are sent in, the conversion from S Expression to List
object in cppfunction fails with an out of buffer error.  I assume that the
buffer size is limited by design, since there's a simple work around.


On Thu, Feb 7, 2013 at 12:28 PM, Davor Cubranic <cubranic at stat.ubc.ca>wrote:

> I come late to this discussion, but it should be pointed out that using
> "sprintf" without ensuring that your buffer is long enough is not a
> "subtlety" but a bug.
>
> A more "C++ way" to do it, and most importantly safer, would be to use
> std::ostringstream:
>
> for (int i = 0; i < nrows; i++) {
>   std::ostringstream rowname;
>   rowname << i;
>   row_names(j) = rowname;
> }
> for (int j = 0; j < ncols; j++) {
>   std::ostringstream colname;
>   colname << < "X." << j;
>   col_names(j) = colname;
> }
>
>
> Davor
>
>
> On 2013-01-18, at 3:25 PM, John Merrill wrote:
>
> Sure.  I'll write something up for the gallery, but here's the crude
> outline.
>
> Here's the C++ code:
>
> #include <Rcpp.h>
>
> using namespace Rcpp;
>
> // [[Rcpp::export]]
> List BuildCheapDataFrame(List a) {
>   List returned_frame = clone(a);
>   GenericVector sample_row = returned_frame(1);
>
>   StringVector row_names(sample_row.length());
>   for (int i = 0; i < sample_row.length(); ++i) {
>     char name[5];
>     sprintf(&(name[0]), "%d", i);
>     row_names(i) = name;
>   }
>   returned_frame.attr("row.names") = row_names;
>
>   StringVector col_names(returned_frame.length());
>   for (int j = 0; j < returned_frame.length(); ++j) {
>     char name[6];
>     sprintf(&(name[0]), "X.%d", j);
>     col_names(j) = name;
>   }
>   returned_frame.attr("names") = col_names;
>   returned_frame.attr("class") = "data.frame";
>
>   return returned_frame;
> }
>
> There are some subtleties in this code:
>
> * It turns out that one can't send super-large data frames to it because
> of possible buffer overflows.  I've never seen that problem when I've
> written Rcpp functions which exchanged SEXPs with R, but this one uses
> Rcpp:export in order to use sourceCpp.
> * Notice the invocation of clone() in the first line of the code.  If you
> don't do that, you wind up side-effecting the parameter, which is not what
> most people would expect.
>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130217/9106b43e/attachment.html>


More information about the Rcpp-devel mailing list