[Rcpp-devel] Rcpp internal versions of lapply and sapply

Romain Francois romain.francois at dbmail.com
Wed Jun 16 16:55:38 CEST 2010


Small update incorporating the use of all (and using namespace Rcpp to 
make things slightly less cluterred).

require( Rcpp )
require( inline )

inc <- '

using namespace Rcpp ;

template <typename OBJECT, typename FUN>
Vector< traits::r_sexptype_traits< typename FUN::result_type>::rtype >
sapply(const OBJECT& object, FUN fun) {

	const int RTYPE = traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;
	
	Vector<RTYPE> ans = Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	return ans ;
}

template <int RT, typename FUN>
Vector< traits::r_sexptype_traits< typename FUN::result_type>::rtype >
sapply( const Vector<RT>& object, FUN fun){

	const int RTYPE = traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;
	
	Vector<RTYPE> ans = Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	ans.names() = object.names() ;
	return ans ;
}

template <typename T>
struct square : std::unary_function<T,T> {
	inline T operator()( const T& t ){ return t*t ; }
} ;

'

code <- '
	NumericVector xx( x) ;
	return all( sapply( xx, square<double>() ) < 15 ) ;
	
'

fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
fx( seq(1,10, length = 5 ) )


This needs rev 1555 of Rcpp for the "< 15" part. I'll try to come up 
with a lazy version of this for the Rcpp sugar.

I'll probably recycle some of this thread into a blog post (with nicer 
formatting).


Also worth noting the existence of Rcpp::unary_call, Rcpp::fixed_call 
and Rcpp::binary_call that wrap up an R function in a strong type 
illusion (powered by the twins Rcpp::as and Rcpp::wrap).

So for example (this will be less efficient for obvious reasons):

code <- '
	NumericVector xx( x) ;
	return all( sapply( xx, unary_call<double,double>( Function("square") ) 
) < 15 ) ;
	
'

square <- function(x) x^2
fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
fx( seq(1,10, length = 5 ) )

Romain

PS : More examples of these in the runit.Language.R file.

Le 16/06/10 16:18, Romain Francois a écrit :
>
>
> Le 16/06/10 14:58, Douglas Bates a écrit :
>>
>> Ever since I read Phil Spector's book on S I have been a fan of
>> functional programming in S and R. When Jose Pinheiro and I were
>> working on the nlme package there was a joke between us that you could
>> tell which of us wrote which parts of the code because his parts
>> always had an object named "aux" and my parts always had
>> "unlist(lapply(list(...)))".
>>
>> Even within C++ I like to use the std:: algorithms like
>> std::transform, but, of course, there are differences between a
>> strongly typed language like C++ and a dynamically typed language like
>> S. Templates can get around these differences to some extent but I am
>> still a bit of a novice regarding templates.
>>
>> Currently I want to apply some functions to lists but entirely within
>> the C++ code (i.e. I don't want to create Rcpp Function objects and
>> call back to R). For the sake of argument, consider a function that
>> extracts the lengths of the components of the lists.
>>
>>
>> library(Rcpp)
>> library(inline)
>> inc<- '
>> class length {
>> public:
>> R_len_t operator() (RObject const& x) {return Rf_length(SEXP(x));}
>> };
>> '
>> code<- '
>> List lst(ll);
>> IntegerVector ans(lst.size());
>> std::transform(lst.begin(), lst.end(), ans.begin(), length());
>> return ans;
>> '
>> ltst<- cxxfunction(signature(ll = "list"), code, "Rcpp", inc)
>> ll<- list(a = numeric(0), b = LETTERS[6:9], c = c)
>> ltst(ll)
>> sapply(ll, length)
>>
>> I would like to create a templated sapply function like
>>
>> template<int RTYPE>
>> Vector<RTYPE> sapply(List ll, ??) {
>> Vector<RTYPE> ans(ll.size());
>> CharacterVector nms = ll.names();
>> if (nms.size() == ll.size()) ans.names() = nms;
>> std::transform(ll.begin(), ll.end(), ans.begin(), ??);
>> return ans;
>> }
>>
>> but I don't know how to specify the second argument that is a function
>> that returns the atomic element type of a Vector<RTYPE> (is this as
>> simple as Vector<RTYPE>::value_type?) and has a single argument which
>> probably should be an RObject (although might be an SEXP, if that was
>> more convenient). Can someone (probably Romain) provide some
>> guidance?
>
> I recently added Vector::import_transform which might be all you need.
>
> NumericVector x = NumericVector::import_transform(
> y.begin(), y.end(), f) ;
>
>
> This will work as long as f is of some type acceptable by std::transform.
>
>
> But let's keep going anyway. One way (I'll write later why this is not
> fully satisfactory) is here:
>
> require( Rcpp )
> require( inline )
>
> inc <- '
>
> template <typename OBJECT, typename FUN>
> SEXP sapply(const OBJECT& object, FUN fun) {
>
> const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
> FUN::result_type>::rtype ;
>
> Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
> object.begin(), object.end(), fun
> ) ;
> return ans ;
> }
>
> template <int RT, typename FUN>
> SEXP sapply( const Rcpp::Vector<RT>& object, FUN fun){
>
> const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
> FUN::result_type>::rtype ;
>
> Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
> object.begin(), object.end(), fun
> ) ;
> ans.names() = object.names() ;
> return ans ;
> }
>
> template <typename T>
> struct square : std::unary_function<T,T> {
> inline T operator()( T& t ){ return t*t ; }
> } ;
>
> '
>
> code <- '
> NumericVector xx( x );
>
> return sapply( xx, square<double>() ) ;
>
> '
>
> fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
>
>
>  > fx( seq(1,10, length = 5 ) )
> [1] 1.0000 10.5625 30.2500 60.0625 100.0000
>
> I'll try to explain bit by bit. Please let me know if something is too
> cryptic. The good thing about templates is that as long as they work
> they don't complain. The bad thing is that when they start complaining,
> you need large screens to accomodate all the warnings/erros you get...
>
>
>
> Starting from the last thing, the square struct.
>
> template <typename T>
> struct square : std::unary_function<T,T> {
> inline T operator()( const T& t ){ return t*t ; }
> } ;
>
> we need the function that goes into sapply to help, i.e we need it to be
> aware of the result type. inheriting from std::unary_function makes this
> easy. It essentially just adds some typedef. See
> http://www.cplusplus.com/reference/std/functional/unary_function/
>
> So when we do square<double>(), the function is self aware that it
> returns a double. We need that to decide which kind of R vectors we want
> to create. Which is the job of the Rcpp::traits::r_sexptype_traits
> trait. So:
>
> const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
> FUN::result_type>::rtype ;
>
> it is very important that this is const, because we are using this as a
> template argument later and this needs to be compile time constant.
>
> Then, we just delegate to std::transform through Vector::import_transform:
>
> Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
> object.begin(), object.end(), fun
> ) ;
>
>
>
> For now the sapply just return a SEXP because that is easy, but we could
> have the result type deduced as well (fasten your seatbelt):
>
> template <typename OBJECT, typename FUN>
> Rcpp::Vector< Rcpp::traits::r_sexptype_traits< typename
> FUN::result_type>::rtype >
> sapply(const OBJECT& object, FUN fun) {
>
>
> Anyway, why is there two sapply ? because the first one is more generic,
> you could sapply on a std::vector<double> for example, the only thing we
> do with the object is to call begin() and end(), so as long as they
> exist and they produce something that makes sense for std::transform, we
> are fine (this is why templates are just so great).
>
> The second version is specific to Rcpp vectors, on which we can call
> names().
>
>
>
> Now why is not satisfactory ? This is not lazy enough for me. With a
> little bit more work, instead of returning a Rcpp::Vector which
> allocates all of its memory right now (not lazy), we could have sapply
> returning some sort of proxy object. The proxy object would expose a
> similar interface : operator[] and iterator, but the result will only be
> calculated when truly necessary.
>
> Why should we care about that ? Imagine, instead of just calculating the
> square, we want to find out if all the squared values are below 15, ie.
> the equivalent of the R code "all( x^2 < 15 )" ? We only need to
> calculate three values to make our decision, so why bother calculating
> the 2 last ones. (lazy)
>
> This is the kind of thing I want to bring to Rcpp with the "sugar" piece
> of the puzzle.
>
> We would then have this expression:
>
> code <- '
> Rcpp::NumericVector xx( x) ;
>
> return Rcpp::all( sapply( xx, square<double>() ) < 15 ) ;
>
> '
>
> It almost work right now (I need to add some code so that <15 makes
> sense, I only added Vector < Vector so far), but currently requires to
> compute all the 5 values before handling the data to all.
>
>
> Please come back with questions if some gaps need to be filled.
>
> Romain
>
>


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/98Uf7u : Rcpp 0.8.1
|- http://bit.ly/c6YnCi : graph gallery collage
`- http://bit.ly/bZ7ltC : inline 0.3.5




More information about the Rcpp-devel mailing list