[Rcpp-devel] Additional constructors for DataFrame
Dirk Eddelbuettel
edd at debian.org
Tue Nov 27 04:00:39 CET 2012
On 26 November 2012 at 12:39, Davor Cubranic wrote:
| `create` is only useful if I know the contents of the data frame at compile
| time. In this case, until runtime I don't know how many columns there are,
| their names, types, or the number of rows -- except that all columns have the
| same number of rows. I've seen questions on this code pattern come up regularly
| here, so I can't be the only one using it. The standard response is "build it
| as a List and then convert it to a DataFrame at the end", and that's what's
| worked for me for a long time, except that I now discovered that this won't
| carry over the "row.names" attribute. So I can't do something like this:
|
|
| cppFunction('#include <vector>
| DataFrame foo2(CharacterVector x, NumericVector y) {
| List foo(2);
| foo[0] = x;
| foo[1] = y;
|
| std::string names[] = {"x", "y"};
| foo.attr("names") = std::vector<std::string>(names, names+2);
|
| foo.attr("row.names") = x;
|
| return DataFrame(foo);
| }')
|
| foo2(letters[1:2], 3:4)
|
|
| But instead have to do this:
|
|
| cppFunction('#include <vector>
| List foo3(CharacterVector x, NumericVector y) {
| List foo(2);
| foo[0] = x;
| foo[1] = y;
|
| std::string names[] = {"x", "y"};
| foo.attr("names") = std::vector<std::string>(names, names+2);
|
| DataFrame fooDf(foo);
| fooDf.attr("row.names") = x;
|
| return fooDf;
| }')
|
| foo3(letters[1:2], 3:4)
|
|
| Having that extra variable towards the end sticks out like a sore thumb and
| makes me wonder why couldn't I do this:
|
|
| cppFunction('#include <vector>
| DataFrame foo4(CharacterVector x, NumericVector y) {
| DataFrame foo; // or even better: foo(2);
| foo[0] = x;
| foo[1] = y;
|
| std::string names[] = {"x", "y"};
| foo.attr("names") = std::vector<std::string>(names, names+2);
|
| foo.attr("row.names") = x;
|
| return foo;
| }')
|
| foo4(letters[1:2], 3:4)
|
|
| This is not that different from what we'd do in R, right?
What I used in the past (with arguably less refined facilities than we have
now in Rcpp) was to simply carry collections of vectors in a list, including
a vector of rownames.
Once returned to R, it is quick to assemble this into the final data
structure. You still get the efficient growth at the C++ level.
That worked, but if you really have an itch here you know where to send
patches :)
Cheers, Dirk
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
More information about the Rcpp-devel
mailing list