[Rcpp-devel] Additional constructors for DataFrame

Dirk Eddelbuettel edd at debian.org
Tue Nov 27 04:00:39 CET 2012


On 26 November 2012 at 12:39, Davor Cubranic wrote:
| `create` is only useful if I know the contents of the data frame at compile
| time. In this case, until runtime I don't know how many columns there are,
| their names, types, or the number of rows -- except that all columns have the
| same number of rows. I've seen questions on this code pattern come up regularly
| here, so I can't be the only one using it. The standard response is "build it
| as a List and then convert it to a DataFrame at the end", and that's what's
| worked for me for a long time, except that I now discovered that this won't
| carry over the "row.names" attribute. So I can't do something like this:
| 
| 
|     cppFunction('#include <vector>
|     DataFrame foo2(CharacterVector x, NumericVector y) {
|       List foo(2);
|       foo[0] = x;
|       foo[1] = y;
| 
|       std::string names[] = {"x", "y"};
|       foo.attr("names") = std::vector<std::string>(names, names+2);
| 
|       foo.attr("row.names") = x;
| 
|       return DataFrame(foo);
|     }')
| 
|     foo2(letters[1:2], 3:4)
| 
| 
| But instead have to do this:
| 
| 
|     cppFunction('#include <vector>
|     List foo3(CharacterVector x, NumericVector y) {
|       List foo(2);
|       foo[0] = x;
|       foo[1] = y;
| 
|       std::string names[] = {"x", "y"};
|       foo.attr("names") = std::vector<std::string>(names, names+2);
| 
|       DataFrame fooDf(foo);
|       fooDf.attr("row.names") = x;
| 
|       return fooDf;
|     }')
| 
|     foo3(letters[1:2], 3:4)
| 
| 
| Having that extra variable towards the end sticks out like a sore thumb and
| makes me wonder why couldn't I do this:
| 
| 
|     cppFunction('#include <vector>
|     DataFrame foo4(CharacterVector x, NumericVector y) {
|       DataFrame foo; // or even better: foo(2);
|       foo[0] = x;
|       foo[1] = y;
|       
|       std::string names[] = {"x", "y"};
|       foo.attr("names") = std::vector<std::string>(names, names+2);
| 
|       foo.attr("row.names") = x;
| 
|       return foo;
|     }')
| 
|     foo4(letters[1:2], 3:4)
| 
| 
| This is not that different from what we'd do in R, right?

What I used in the past (with arguably less refined facilities than we have
now in Rcpp) was to simply carry collections of vectors in a list, including
a vector of rownames.

Once returned to R, it is quick to assemble this into the final data
structure. You still get the efficient growth at the C++ level.

That worked, but if you really have an itch here you know where to send
patches :)

Cheers, Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com  


More information about the Rcpp-devel mailing list