[Rcpp-devel] Accessing data frame information within Rcpp (question from Stack Overflow)
Dirk Eddelbuettel
edd at debian.org
Mon Jun 11 03:36:34 CEST 2012
On 9 June 2012 at 17:28, T P wrote:
|
| I'm reposting the following question relating to using data frames in Rcpp - I
| originally put it up on StackOverflow but Dirk directed me to post it here
| instead. I'm interested in whether there's a resolution to this issue, and if
| not, whether there are future plans to resolve it.
|
| This is my first post on here, so go easy - I'm hoping my query will get a
| better response than on SO!
|
|
| In the R / Rcpp code shown in italics below (a toy example), I beam across the
| data frame mydf to the Rcpp code (and pick it up as DF), and then count the
| number of age values that exceed 21, and the number of name values that equal
| "Bob" or "Eve". The two answers (4 and 2) are returned as a list, as shown at
| the end of the code. All hopefully self-explanatory.
|
| Here's my question: Rcpp clearly understands DF["name"] and DF["age"] as being
| the columns name and age in DF - that's great. Given that this notation is
| meaningful, what notation can we use to refer directly to the individual
| elements in DF, so that we don't need to generate intermediate vectors (i.e.
| the std::vectors name and age in the code below)? The reason I ask is that in
| practice the input data frame(s) may well have a much, much greater number of
| columns, and it feels unwieldy to have to map each one individually to a vector
| given that the information is clearly already contained within the DF object.
| If we had to do this to use the columns of a data frame in R, there'd be a
| riot!
|
| I imagine an answer to this question will be valuable to all those who use Rcpp
| for complex tasks where data frames need passing (which is presumably
| everything beyond a certain level of complexity), so I thought I'd map things
| out in detail. Hope the question is clear, and many thanks in advance for your
| help. :)
1) Please post with full names, preferably also with some affiliation. "T P"
does not really qualify.
2) You asked several related questions on StackOverflow; I fear you may
simply not understand C++ well enough to appreciate why what you ask for is
both difficult as well as not necessarily useful to a C++ programmer. Rcpp
is /not/ a simple 'R to C++' translation tool. It rather is device to enable
interoperability. But when your in a C++ context ... you are bound by C++
rules. Hence a data.frame as a collection of vectors etc pp
Sorry, no silver bullet.
Dirk
|
|
| library(inline)
|
| mydf = data.frame(name=c("Amy","Bob","Cal","Dan","Eve","Fay","Gus"),
| age=c(24,17,31,28,19,20,25), stringsAsFactors=FALSE)
|
|
| testfunc1 = cxxfunction(
| signature(DFin = "data.frame"),
| plugin = "Rcpp",
| body = '
| Rcpp::DataFrame DF(DFin);
| std::vector<std::string> name =
| Rcpp::as< std::vector<std::string> >(DF["name"]);
| std::vector<int> age =
| Rcpp::as< std::vector<int> >(DF["age"]);
| int n = name.size();
| int counter1 = 0;
| int counter2 = 0;
| for (int i = 0; i < n; i++) {
| if (age[i] > 21) {
| counter1++;
| }
| if ((name[i] == "Bob") | (name[i] == "Eve")) {
| counter2++;
| }
| }
| return(Rcpp::List::create( _["counter1"] = counter1,
| _["counter2"] = counter2 ));
| ')
|
| out = testfunc1(mydf)
| print(out)
|
| The output in out is of course:
|
| $counter1
| [1] 4
|
| $counter2
| [1] 2
|
|
| ----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
More information about the Rcpp-devel
mailing list