[Rcpp-devel] Please help! A list containing dataframe

Romain Francois romain at r-enthusiasts.com
Fri Sep 27 16:33:32 CEST 2013


Hello,

Storing the data frame as a vector<tuple<...>> feels very inefficient, 
in essence you are copying all the data to another structure, which is 
not much easier to use anyway. The fun implementation feels boiler plate :

int fun(MyRow row) {
   return 
boost::get<0>(row)+2*boost::get<1>(row)+3*boost::get<2>(row)+4*boost::get<3>(row);
}

The version I proposed is not restricted to 4 columns and will be more 
efficient since it does not need to copy all the data. It just stores 
one line at a time and processes it.



Now on variadic templates, yes they can definitely help. In Rcpp11 I'm 
using them extensively and it allowed me to reduce the code size 
dramatically (Rcpp11 is about 40% the size of Rcpp).

See for example : 
https://github.com/romainfrancois/Rcpp11/blob/master/inst/include/Rcpp/sugar/functions/replicate.h

This is used to implement this feature:

double fun( double x, double y, int z ){
	return x + y + z ;
}

NumericVector x = replicate( 10, call( fun, 1.0, 2.0, 3 ) ) ;


Another example is this 75 file:
https://github.com/romainfrancois/Rcpp11/blob/master/inst/include/Rcpp/module/FunctionInvoker.h

which replaces a file that weights about 14666 lines in Rcpp.

Romain

Le 27/09/13 16:12, Mark Clements a écrit :
> This can be done more generally.
>
> Following an earlier suggestion from Romain, we can use boost::tuple from the BH package - for a row of fixed size with general types. Then we can use a template to read in the data-frame and work with the set of rows.
>
> Variadic templates would be nice here, rather than needing to enumerate for tuples of different lengths.
>
> Out of interest, is this poor style for Rcpp?
>
> Sincerely, Mark.
>
> require(inline)
> testReadDf <-
>    rcpp(signature(df="data.frame"),
>         includes="
> #include <boost/tuple/tuple.hpp>
> #include <vector>
> #include <algorithm>
> // general function to read a data-frame
> template <class T1, class T2, class T3, class T4>
> std::vector<boost::tuple<T1,T2,T3,T4> > read_df( DataFrame df ){
>       typedef boost::tuple<T1,T2,T3,T4> Row;
>       int n = df.nrows() ;
>       std::vector<Row> rows(n) ;
>       Vector<traits::r_sexptype_traits<T1>::rtype> df0 = df[0];
>       Vector<traits::r_sexptype_traits<T2>::rtype> df1 = df[1];
>       Vector<traits::r_sexptype_traits<T3>::rtype> df2 = df[2];
>       Vector<traits::r_sexptype_traits<T4>::rtype> df3 = df[3];
>       for( int i=0; i<n; i++)
>           rows[i] = Row(df0[i],df1[i],df2[i],df3[i]);
>       return rows ;
> }
> // example function
> typedef boost::tuple<int,int,int,int> MyRow;
> int fun(MyRow row) {
>    return boost::get<0>(row)+2*boost::get<1>(row)+3*boost::get<2>(row)+4*boost::get<3>(row);
> }
> ",
>         body="
> // read in the data-frame as a vector of rows
> std::vector<MyRow> v = read_df<int,int,int,int>(df);
> int n = v.size();
> std::vector<int> out(n);
> std::transform(v.begin(),v.end(),out.begin(),fun);
> return wrap(out);
> ")
> testReadDf(data.frame(1,2,3,4))


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30



More information about the Rcpp-devel mailing list