[Rcpp-devel] Additional constructors for DataFrame

Davor Cubranic cubranic at stat.ubc.ca
Mon Nov 26 21:39:25 CET 2012


On 2012-11-26, at 1:10 AM, romain at r-enthusiasts.com wrote:

> Le 2012-11-26 06:19, Davor Cubranic a écrit :
>> Although DataFrame is a subclass of List, it's missing a few handy
>> constructors that List (i.e., Vector) have, such as "(const int&
>> size)" to pre-allocate the space for it.
> 
> So that you would have `size` columns, but how many rows, of which types are the rows.

But this constructor only preallocates the space for the list, and the size is still 0 until you add elements to it. I don't see why types of rows (or columns) matter, or even how many rows there are as long as they are all the same.

> I'd suggest you have a look at one of the DataFrame::create

`create` is only useful if I know the contents of the data frame at compile time. In this case, until runtime I don't know how many columns there are, their names, types, or the number of rows -- except that all columns have the same number of rows. I've seen questions on this code pattern come up regularly here, so I can't be the only one using it. The standard response is "build it as a List and then convert it to a DataFrame at the end", and that's what's worked for me for a long time, except that I now discovered that this won't carry over the "row.names" attribute. So I can't do something like this:

cppFunction('#include <vector>
DataFrame foo2(CharacterVector x, NumericVector y) {
  List foo(2);
  foo[0] = x;
  foo[1] = y;

  std::string names[] = {"x", "y"};
  foo.attr("names") = std::vector<std::string>(names, names+2);

  foo.attr("row.names") = x;

  return DataFrame(foo);
}')

foo2(letters[1:2], 3:4)

But instead have to do this:

cppFunction('#include <vector>
List foo3(CharacterVector x, NumericVector y) {
  List foo(2);
  foo[0] = x;
  foo[1] = y;

  std::string names[] = {"x", "y"};
  foo.attr("names") = std::vector<std::string>(names, names+2);

  DataFrame fooDf(foo);
  fooDf.attr("row.names") = x;

  return fooDf;
}')

foo3(letters[1:2], 3:4)

Having that extra variable towards the end sticks out like a sore thumb and makes me wonder why couldn't I do this:

cppFunction('#include <vector>
DataFrame foo4(CharacterVector x, NumericVector y) {
  DataFrame foo; // or even better: foo(2);
  foo[0] = x;
  foo[1] = y;
  
  std::string names[] = {"x", "y"};
  foo.attr("names") = std::vector<std::string>(names, names+2);

  foo.attr("row.names") = x;

  return foo;
}')

foo4(letters[1:2], 3:4)

This is not that different from what we'd do in R, right?

Davor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121126/8ce3a7ca/attachment-0001.html>


More information about the Rcpp-devel mailing list