[Rcpp-devel] Please help! A list containing dataframe
Romain Francois
romain at r-enthusiasts.com
Fri Sep 27 12:46:32 CEST 2013
Le 27/09/13 12:11, sky Xue a écrit :
> Hello,
>
> I have a list below whose elements are data frames (Please see the
> attached file “try.dat”). Now I want to apply a complicated function to
> each row of the data frame which returns a single value. For simplicity,
> you can assume this function is ma(x) (x is the row of the data frame).
>
> [[1]]
> class_id student_id 1 2
> 1 1 1 9 14
> 2 1 2 4 1
> 3 1 3 10 8
> 4 1 4 7 7
> 5 1 5 6 11
> 6 1 6 1 3
> 7 1 7 14 10
> 8 1 8 13 12
> 9 1 9 12 2
> 10 1 10 3 9
> 11 1 11 8 4
> 12 1 12 11 6
> 13 1 13 2 13
> 14 1 14 5 5
>
> [[2]]
> class_id student_id 1 2
> 15 2 1 11 3
> 16 2 2 7 10
> 17 2 3 2 2
> 18 2 4 6 6
> 19 2 5 13 8
> 20 2 6 12 13
> 21 2 7 8 14
> 22 2 8 1 9
> 23 2 9 3 1
> 24 2 10 4 11
> 25 2 11 5 4
> 26 2 12 9 12
> 27 2 13 10 7
> 28 2 14 14 5
>
> [[3]]
> class_id student_id 1 2
> 29 3 1 12 6
> 30 3 2 1 3
> 31 3 3 8 2
> 32 3 4 9 10
> 33 3 5 11 7
> 34 3 6 14 4
> 35 3 7 2 14
> 36 3 8 13 13
> 37 3 9 3 8
> 38 3 10 5 11
> 39 3 11 4 12
> 40 3 12 7 1
> 41 3 13 10 5
> 42 3 14 6 9
>
> In real situation the list will be very long, and the dataframe is much
> wider. That’s why I want to use Rcpp to improve the speed.
>
> I got stuck from the very beginning, I failed to import this list to
> Rcpp, not to mention import the dataframe to Rcpp.
>
> I’ve checked the book Seamless R and C++ integration with Rcpp but find
> NO example deals with such case.
>
> Thank you very much for your support!
>
> Best regards,
>
> Sky
Rcpp has the Rcpp::DataFrame class which might help you but it does not
do much.
A data.frame is merely a list of vectors of the same size, but of
arbitrary types. This makes it difficult to process rows of a data frame.
So you have to do some work to grab a row of a data frame and apply
something to it. The code below assumes that you have a data frame that
contains only numeric vectors.
#include <Rcpp.h>
using namespace Rcpp;
double fun( NumericVector x){
return sum(x) ;
}
void fill_row( NumericVector& row, const std::vector<NumericVector>&
vectors, int i, int n){
for( int j=0; j<n; j++){
row[j] = vectors[j][i] ;
}
}
// [[Rcpp::export]]
NumericVector apply_row_df( DataFrame df ){
int n = df.size() ;
int nrows = df.nrows() ;
std::vector<NumericVector> vectors(n) ;
for( int i=0; i<n; i++) vectors[i] = df[i] ;
NumericVector row(n) ;
NumericVector results(nrows) ;
for( int i=0; i<nrows; i++){
fill_row( row, vectors, i, n );
results[i]=fun(row) ;
}
return results ;
}
// [[Rcpp::export]]
List apply_all( List list ){
return lapply( list, apply_row_df) ;
}
/*** R
df <- data.frame( x = seq(0, 10, .1), y = seq(0, 10, .1), z =
seq(0, 10, .1) )
apply_row_df( df )
list_of_df <- rep( list(df), 10 )
apply_all( list_of_df )
*/
The function apply_row_df works on a single data frame, it calls the fun
function on each row of the data frame. Prior to that we fill the vector
"row" with data using the fill_row function.
Then it is just looping, etc ...
The apply_all is just a convenience that will apply apply_row_df to each
item of a list.
Hope this helps.
Romain
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
More information about the Rcpp-devel
mailing list