[Rcpp-devel] Accessing data frame information within Rcpp (question from Stack Overflow)

T P timp2012 at hotmail.com
Sat Jun 9 18:28:20 CEST 2012

I'm reposting the following question relating to using data frames in Rcpp - I originally put it up on StackOverflow but Dirk directed me to post it here instead. I'm interested in whether there's a resolution to this issue, and if not, whether there are future plans to resolve it. 

This is my first post on here, so go easy - I'm hoping my query will get a better response than on SO!

In the R / Rcpp code shown in italics below (a toy example), I beam across the data frame mydf to the Rcpp code (and pick it up as DF), and then count the number of age values that exceed 21, and the number of name values that equal "Bob" or "Eve". The two answers (4 and 2) are returned as a list, as shown at the end of the code. All hopefully self-explanatory.

Here's my question: Rcpp clearly understands DF["name"] and DF["age"] as being the columns name and age in DF - that's great. Given that this notation is meaningful, what notation can we use to refer directly to the individual elements in DF, so that we don't need to generate intermediate vectors (i.e. the std::vectors name and age in the code below)? The reason I ask is that in practice the input data frame(s) may well have a much, much greater number of columns, and it feels unwieldy to have to map each one individually to a vector given that the information is clearly already contained within the DF object. If we had to do this to use the columns of a data frame in R, there'd be a riot!

I imagine an answer to this question will be valuable to all those who use Rcpp for complex tasks where data frames need passing (which is presumably everything beyond a certain level of complexity), so I thought I'd map things out in detail. Hope the question is clear, and many thanks in advance for your help. :)


mydf = data.frame(name=c("Amy","Bob","Cal","Dan","Eve","Fay","Gus"), 
                  age=c(24,17,31,28,19,20,25), stringsAsFactors=FALSE)

testfunc1 = cxxfunction(
    signature(DFin = "data.frame"),
    plugin = "Rcpp",
    body = '
        Rcpp::DataFrame DF(DFin);
        std::vector<std::string> name = 
                         Rcpp::as< std::vector<std::string> >(DF["name"]);
        std::vector<int> age = 
                         Rcpp::as< std::vector<int> >(DF["age"]);
        int n = name.size();
        int counter1 = 0;
        int counter2 = 0;
        for (int i = 0; i < n; i++) {
            if (age[i] > 21) {
            if ((name[i] == "Bob") | (name[i] == "Eve")) {
        return(Rcpp::List::create( _["counter1"] = counter1, 
                                   _["counter2"] = counter2 ));

out = testfunc1(mydf)

The output in out is of course:

[1] 4

[1] 2

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120609/1ee8d5c4/attachment.html>

More information about the Rcpp-devel mailing list