[Rcpp-devel] Additional constructors for DataFrame
Davor Cubranic
cubranic at stat.ubc.ca
Mon Nov 26 21:39:25 CET 2012
On 2012-11-26, at 1:10 AM, romain at r-enthusiasts.com wrote:
> Le 2012-11-26 06:19, Davor Cubranic a écrit :
>> Although DataFrame is a subclass of List, it's missing a few handy
>> constructors that List (i.e., Vector) have, such as "(const int&
>> size)" to pre-allocate the space for it.
>
> So that you would have `size` columns, but how many rows, of which types are the rows.
But this constructor only preallocates the space for the list, and the size is still 0 until you add elements to it. I don't see why types of rows (or columns) matter, or even how many rows there are as long as they are all the same.
> I'd suggest you have a look at one of the DataFrame::create
`create` is only useful if I know the contents of the data frame at compile time. In this case, until runtime I don't know how many columns there are, their names, types, or the number of rows -- except that all columns have the same number of rows. I've seen questions on this code pattern come up regularly here, so I can't be the only one using it. The standard response is "build it as a List and then convert it to a DataFrame at the end", and that's what's worked for me for a long time, except that I now discovered that this won't carry over the "row.names" attribute. So I can't do something like this:
cppFunction('#include <vector>
DataFrame foo2(CharacterVector x, NumericVector y) {
List foo(2);
foo[0] = x;
foo[1] = y;
std::string names[] = {"x", "y"};
foo.attr("names") = std::vector<std::string>(names, names+2);
foo.attr("row.names") = x;
return DataFrame(foo);
}')
foo2(letters[1:2], 3:4)
But instead have to do this:
cppFunction('#include <vector>
List foo3(CharacterVector x, NumericVector y) {
List foo(2);
foo[0] = x;
foo[1] = y;
std::string names[] = {"x", "y"};
foo.attr("names") = std::vector<std::string>(names, names+2);
DataFrame fooDf(foo);
fooDf.attr("row.names") = x;
return fooDf;
}')
foo3(letters[1:2], 3:4)
Having that extra variable towards the end sticks out like a sore thumb and makes me wonder why couldn't I do this:
cppFunction('#include <vector>
DataFrame foo4(CharacterVector x, NumericVector y) {
DataFrame foo; // or even better: foo(2);
foo[0] = x;
foo[1] = y;
std::string names[] = {"x", "y"};
foo.attr("names") = std::vector<std::string>(names, names+2);
foo.attr("row.names") = x;
return foo;
}')
foo4(letters[1:2], 3:4)
This is not that different from what we'd do in R, right?
Davor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121126/8ce3a7ca/attachment-0001.html>
More information about the Rcpp-devel
mailing list