[Rcpp-devel] Rcpp internal versions of lapply and sapply

Wed Jun 16 16:18:13 CEST 2010

Le 16/06/10 14:58, Douglas Bates a écrit :
>
> Ever since I read Phil Spector's book on S I have been a fan of
> functional programming in S and R.  When Jose Pinheiro and I were
> working on the nlme package there was a joke between us that you could
> tell which of us wrote which parts of the code because his parts
> always had an object named "aux" and my parts always had
> "unlist(lapply(list(...)))".
>
> Even within C++ I like to use the std:: algorithms like
> std::transform, but, of course, there are differences between a
> strongly typed language like C++ and a dynamically typed language like
> S.  Templates can get around these differences to some extent but I am
> still a bit of a novice regarding templates.
>
> Currently I want to apply some functions to lists but entirely within
> the C++ code (i.e. I don't want to create Rcpp Function objects and
> call back to R).  For the sake of argument, consider a function that
> extracts the lengths of the components of the lists.
>
>
> library(Rcpp)
> library(inline)
> inc<- '
> class length {
> public:
>      R_len_t operator() (RObject const&  x) {return Rf_length(SEXP(x));}
> };
> '
> code<- '
> List lst(ll);
> IntegerVector ans(lst.size());
> std::transform(lst.begin(), lst.end(), ans.begin(), length());
> return ans;
> '
> ltst<- cxxfunction(signature(ll = "list"), code, "Rcpp", inc)
> ll<- list(a = numeric(0), b = LETTERS[6:9], c = c)
> ltst(ll)
> sapply(ll, length)
>
> I would like to create a templated sapply function like
>
> template<int RTYPE>
> Vector<RTYPE>  sapply(List ll, ??) {
>      Vector<RTYPE>  ans(ll.size());
>      CharacterVector nms = ll.names();
>      if (nms.size() == ll.size()) ans.names() = nms;
>      std::transform(ll.begin(), ll.end(), ans.begin(), ??);
>      return ans;
> }
>
> but I don't know how to specify the second argument that is a function
> that returns the atomic element type of a Vector<RTYPE>  (is this as
> simple as Vector<RTYPE>::value_type?) and has a single argument which
> probably should be an RObject (although might be an SEXP, if that was
> more convenient).  Can someone (probably Romain) provide some
> guidance?

I recently added Vector::import_transform which might be all you need.

NumericVector x = NumericVector::import_transform(
	y.begin(), y.end(), f) ;

This will work as long as f is of some type acceptable by std::transform.

But let's keep going anyway. One way (I'll write later why this is not 
fully satisfactory) is here:

require( Rcpp )
require( inline )

inc <- '

template <typename OBJECT, typename FUN>
SEXP sapply(const OBJECT& object, FUN fun) {

	const int RTYPE = Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;

	Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	return ans ;
}

template <int RT, typename FUN>
SEXP sapply( const Rcpp::Vector<RT>& object, FUN fun){

	const int RTYPE = Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;

	Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	ans.names() = object.names() ;
	return ans ;
}

template <typename T>
struct square : std::unary_function<T,T> {
	inline T operator()( T& t ){ return t*t ; }
} ;

'

code <- '
	NumericVector xx( x );

	return sapply( xx, square<double>() ) ;

'

fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )

 > fx( seq(1,10, length = 5 ) )
[1]   1.0000  10.5625  30.2500  60.0625 100.0000

I'll try to explain bit by bit. Please let me know if something is too 
cryptic. The good thing about templates is that as long as they work 
they don't complain. The bad thing is that when they start complaining, 
you need large screens to accomodate all the warnings/erros you get...

Starting from the last thing, the square struct.

template <typename T>
struct square : std::unary_function<T,T> {
	inline T operator()( const T& t ){ return t*t ; }
} ;

we need the function that goes into sapply to help, i.e we need it to be 
aware of the result type. inheriting from std::unary_function makes this 
easy. It essentially just adds some typedef. See 
http://www.cplusplus.com/reference/std/functional/unary_function/

So when we do square<double>(), the function is self aware that it 
returns a double. We need that to decide which kind of R vectors we want 
to create. Which is the job of the Rcpp::traits::r_sexptype_traits 
trait. So:

const int RTYPE = Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;

it is very important that this is const, because we are using this as a 
template argument later and this needs to be compile time constant.

Then, we just delegate to std::transform through Vector::import_transform:

Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
	object.begin(), object.end(), fun
) ;

For now the sapply just return a SEXP because that is easy, but we could 
have the result type deduced as well (fasten your seatbelt):

template <typename OBJECT, typename FUN>
Rcpp::Vector< Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype >
sapply(const OBJECT& object, FUN fun) {

Anyway, why is there two sapply ? because the first one is more generic, 
you could sapply on a std::vector<double> for example, the only thing we 
do with the object is to call begin() and end(), so as long as they 
exist and they produce something that makes sense for std::transform, we 
are fine (this is why templates are just so great).

The second version is specific to Rcpp vectors, on which we can call 
names().

Now why is not satisfactory ? This is not lazy enough for me. With a 
little bit more work, instead of returning a Rcpp::Vector which 
allocates all of its memory right now (not lazy), we could have sapply 
returning some sort of proxy object. The proxy object would expose a 
similar interface : operator[] and iterator, but the result will only be 
calculated when truly necessary.

Why should we care about that ? Imagine, instead of just calculating the 
square, we want to find out if all the squared values are below 15, ie. 
the equivalent of the R code "all( x^2 < 15 )" ? We only need to 
calculate three values to make our decision, so why bother calculating 
the 2 last ones. (lazy)

This is the kind of thing I want to bring to Rcpp with the "sugar" piece 
of the puzzle.

We would then have this expression:

code <- '
	Rcpp::NumericVector xx( x) ;

	return Rcpp::all( sapply( xx, square<double>() ) < 15 ) ;

'

It almost work right now (I need to add some code so that <15 makes 
sense, I only added Vector < Vector so far), but currently requires to 
compute all the 5 values before handling the data to all.

Please come back with questions if some gaps need to be filled.

Romain

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/98Uf7u : Rcpp 0.8.1
|- http://bit.ly/c6YnCi : graph gallery collage
`- http://bit.ly/bZ7ltC : inline 0.3.5