[Rcpp-devel] Rcpp internal versions of lapply and sapply
Romain Francois
romain.francois at dbmail.com
Wed Jun 16 16:18:13 CEST 2010
Le 16/06/10 14:58, Douglas Bates a écrit :
>
> Ever since I read Phil Spector's book on S I have been a fan of
> functional programming in S and R. When Jose Pinheiro and I were
> working on the nlme package there was a joke between us that you could
> tell which of us wrote which parts of the code because his parts
> always had an object named "aux" and my parts always had
> "unlist(lapply(list(...)))".
>
> Even within C++ I like to use the std:: algorithms like
> std::transform, but, of course, there are differences between a
> strongly typed language like C++ and a dynamically typed language like
> S. Templates can get around these differences to some extent but I am
> still a bit of a novice regarding templates.
>
> Currently I want to apply some functions to lists but entirely within
> the C++ code (i.e. I don't want to create Rcpp Function objects and
> call back to R). For the sake of argument, consider a function that
> extracts the lengths of the components of the lists.
>
>
> library(Rcpp)
> library(inline)
> inc<- '
> class length {
> public:
> R_len_t operator() (RObject const& x) {return Rf_length(SEXP(x));}
> };
> '
> code<- '
> List lst(ll);
> IntegerVector ans(lst.size());
> std::transform(lst.begin(), lst.end(), ans.begin(), length());
> return ans;
> '
> ltst<- cxxfunction(signature(ll = "list"), code, "Rcpp", inc)
> ll<- list(a = numeric(0), b = LETTERS[6:9], c = c)
> ltst(ll)
> sapply(ll, length)
>
> I would like to create a templated sapply function like
>
> template<int RTYPE>
> Vector<RTYPE> sapply(List ll, ??) {
> Vector<RTYPE> ans(ll.size());
> CharacterVector nms = ll.names();
> if (nms.size() == ll.size()) ans.names() = nms;
> std::transform(ll.begin(), ll.end(), ans.begin(), ??);
> return ans;
> }
>
> but I don't know how to specify the second argument that is a function
> that returns the atomic element type of a Vector<RTYPE> (is this as
> simple as Vector<RTYPE>::value_type?) and has a single argument which
> probably should be an RObject (although might be an SEXP, if that was
> more convenient). Can someone (probably Romain) provide some
> guidance?
I recently added Vector::import_transform which might be all you need.
NumericVector x = NumericVector::import_transform(
y.begin(), y.end(), f) ;
This will work as long as f is of some type acceptable by std::transform.
But let's keep going anyway. One way (I'll write later why this is not
fully satisfactory) is here:
require( Rcpp )
require( inline )
inc <- '
template <typename OBJECT, typename FUN>
SEXP sapply(const OBJECT& object, FUN fun) {
const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
FUN::result_type>::rtype ;
Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
object.begin(), object.end(), fun
) ;
return ans ;
}
template <int RT, typename FUN>
SEXP sapply( const Rcpp::Vector<RT>& object, FUN fun){
const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
FUN::result_type>::rtype ;
Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
object.begin(), object.end(), fun
) ;
ans.names() = object.names() ;
return ans ;
}
template <typename T>
struct square : std::unary_function<T,T> {
inline T operator()( T& t ){ return t*t ; }
} ;
'
code <- '
NumericVector xx( x );
return sapply( xx, square<double>() ) ;
'
fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
> fx( seq(1,10, length = 5 ) )
[1] 1.0000 10.5625 30.2500 60.0625 100.0000
I'll try to explain bit by bit. Please let me know if something is too
cryptic. The good thing about templates is that as long as they work
they don't complain. The bad thing is that when they start complaining,
you need large screens to accomodate all the warnings/erros you get...
Starting from the last thing, the square struct.
template <typename T>
struct square : std::unary_function<T,T> {
inline T operator()( const T& t ){ return t*t ; }
} ;
we need the function that goes into sapply to help, i.e we need it to be
aware of the result type. inheriting from std::unary_function makes this
easy. It essentially just adds some typedef. See
http://www.cplusplus.com/reference/std/functional/unary_function/
So when we do square<double>(), the function is self aware that it
returns a double. We need that to decide which kind of R vectors we want
to create. Which is the job of the Rcpp::traits::r_sexptype_traits
trait. So:
const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
FUN::result_type>::rtype ;
it is very important that this is const, because we are using this as a
template argument later and this needs to be compile time constant.
Then, we just delegate to std::transform through Vector::import_transform:
Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
object.begin(), object.end(), fun
) ;
For now the sapply just return a SEXP because that is easy, but we could
have the result type deduced as well (fasten your seatbelt):
template <typename OBJECT, typename FUN>
Rcpp::Vector< Rcpp::traits::r_sexptype_traits< typename
FUN::result_type>::rtype >
sapply(const OBJECT& object, FUN fun) {
Anyway, why is there two sapply ? because the first one is more generic,
you could sapply on a std::vector<double> for example, the only thing we
do with the object is to call begin() and end(), so as long as they
exist and they produce something that makes sense for std::transform, we
are fine (this is why templates are just so great).
The second version is specific to Rcpp vectors, on which we can call
names().
Now why is not satisfactory ? This is not lazy enough for me. With a
little bit more work, instead of returning a Rcpp::Vector which
allocates all of its memory right now (not lazy), we could have sapply
returning some sort of proxy object. The proxy object would expose a
similar interface : operator[] and iterator, but the result will only be
calculated when truly necessary.
Why should we care about that ? Imagine, instead of just calculating the
square, we want to find out if all the squared values are below 15, ie.
the equivalent of the R code "all( x^2 < 15 )" ? We only need to
calculate three values to make our decision, so why bother calculating
the 2 last ones. (lazy)
This is the kind of thing I want to bring to Rcpp with the "sugar" piece
of the puzzle.
We would then have this expression:
code <- '
Rcpp::NumericVector xx( x) ;
return Rcpp::all( sapply( xx, square<double>() ) < 15 ) ;
'
It almost work right now (I need to add some code so that <15 makes
sense, I only added Vector < Vector so far), but currently requires to
compute all the 5 values before handling the data to all.
Please come back with questions if some gaps need to be filled.
Romain
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/98Uf7u : Rcpp 0.8.1
|- http://bit.ly/c6YnCi : graph gallery collage
`- http://bit.ly/bZ7ltC : inline 0.3.5
More information about the Rcpp-devel
mailing list