[Rcpp-devel] Please help! A list containing dataframe

Romain Francois romain at r-enthusiasts.com
Fri Sep 27 12:46:32 CEST 2013


Le 27/09/13 12:11, sky Xue a écrit :
> Hello,
>
> I have a list  below whose elements are data frames (Please see the
> attached file “try.dat”).  Now I want to apply a complicated function to
> each row of the data frame which returns a single value. For simplicity,
> you can assume this function is ma(x) (x is the row of the data frame).
>
> [[1]]
>     class_id student_id  1  2
> 1         1          1  9 14
> 2         1          2  4  1
> 3         1          3 10  8
> 4         1          4  7  7
> 5         1          5  6 11
> 6         1          6  1  3
> 7         1          7 14 10
> 8         1          8 13 12
> 9         1          9 12  2
> 10        1         10  3  9
> 11        1         11  8  4
> 12        1         12 11  6
> 13        1         13  2 13
> 14        1         14  5  5
>
> [[2]]
>     class_id student_id  1  2
> 15        2          1 11  3
> 16        2          2  7 10
> 17        2          3  2  2
> 18        2          4  6  6
> 19        2          5 13  8
> 20        2          6 12 13
> 21        2          7  8 14
> 22        2          8  1  9
> 23        2          9  3  1
> 24        2         10  4 11
> 25        2         11  5  4
> 26        2         12  9 12
> 27        2         13 10  7
> 28        2         14 14  5
>
> [[3]]
>     class_id student_id  1  2
> 29        3          1 12  6
> 30        3          2  1  3
> 31        3          3  8  2
> 32        3          4  9 10
> 33        3          5 11  7
> 34        3          6 14  4
> 35        3          7  2 14
> 36        3          8 13 13
> 37        3          9  3  8
> 38        3         10  5 11
> 39        3         11  4 12
> 40        3         12  7  1
> 41        3         13 10  5
> 42        3         14  6  9
>
> In real situation the list will be very long, and the dataframe is much
> wider. That’s why I want  to use Rcpp to improve the speed.
>
> I got stuck from the very beginning, I failed to import this list to
> Rcpp, not to mention import the dataframe to Rcpp.
>
> I’ve checked the book Seamless R and C++ integration with Rcpp but find
> NO example deals with such case.
>
> Thank you very much for your support!
>
> Best regards,
>
> Sky

Rcpp has the Rcpp::DataFrame class which might help you but it does not 
do much.

A data.frame is merely a list of vectors of the same size, but of 
arbitrary types. This makes it difficult to process rows of a data frame.

So you have to do some work to grab a row of a data frame and apply 
something to it. The code below assumes that you have a data frame that 
contains only numeric vectors.


#include <Rcpp.h>
using namespace Rcpp;

double fun( NumericVector x){
     return sum(x) ;
}

void fill_row( NumericVector& row, const std::vector<NumericVector>& 
vectors, int i, int n){
     for( int j=0; j<n; j++){
         row[j] = vectors[j][i] ;
     }
}

// [[Rcpp::export]]
NumericVector apply_row_df( DataFrame df ){
     int n = df.size() ;
     int nrows = df.nrows() ;
     std::vector<NumericVector> vectors(n) ;
     for( int i=0; i<n; i++) vectors[i] = df[i] ;

     NumericVector row(n) ;
     NumericVector results(nrows) ;
     for( int i=0; i<nrows; i++){
         fill_row( row, vectors, i, n );
         results[i]=fun(row) ;
     }
     return results ;

}

// [[Rcpp::export]]
List apply_all( List list ){
     return lapply( list, apply_row_df) ;
}

/*** R
     df <- data.frame( x = seq(0, 10, .1), y = seq(0, 10, .1), z = 
seq(0, 10, .1) )
     apply_row_df( df )

     list_of_df <- rep( list(df), 10 )
     apply_all( list_of_df )
*/


The function apply_row_df works on a single data frame, it calls the fun 
function on each row of the data frame. Prior to that we fill the vector 
"row" with data using the fill_row function.

Then it is just looping, etc ...


The apply_all is just a convenience that will apply apply_row_df to each 
item of a list.

Hope this helps.

Romain


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30



More information about the Rcpp-devel mailing list