[Rcpp-devel] Rcpp internal versions of lapply and sapply

Douglas Bates bates at stat.wisc.edu
Wed Jun 16 17:44:14 CEST 2010


Thank you very much, Romain.  This certainly helps my understanding of
unary_function, traits, etc.  These are the sorts of things I can read
about but until I see a non-trivial usage it is hard to keep these
ideas straight.

Two minor questions occur to me at this point.  In my code I assigned
the names to be the names of the input list but I think now that I
should have assigned a clone of the object names instead.  To be safe
I should have cloned before assigning, right?  I also checked the
length before assigning but you didn't.  Am I reflecting too much of
my C programming background here?  In other words, is there some magic
in the Rcpp names assignment that does the checking?

Thanks again.

On Wed, Jun 16, 2010 at 9:55 AM, Romain Francois
<romain.francois at dbmail.com> wrote:
>
> Small update incorporating the use of all (and using namespace Rcpp to make
> things slightly less cluterred).
>
> require( Rcpp )
> require( inline )
>
> inc <- '
>
> using namespace Rcpp ;
>
> template <typename OBJECT, typename FUN>
> Vector< traits::r_sexptype_traits< typename FUN::result_type>::rtype >
> sapply(const OBJECT& object, FUN fun) {
>
>        const int RTYPE = traits::r_sexptype_traits< typename
> FUN::result_type>::rtype ;
>
>        Vector<RTYPE> ans = Vector<RTYPE>::import_transform(
>                object.begin(), object.end(), fun
>                ) ;
>        return ans ;
> }
>
> template <int RT, typename FUN>
> Vector< traits::r_sexptype_traits< typename FUN::result_type>::rtype >
> sapply( const Vector<RT>& object, FUN fun){
>
>        const int RTYPE = traits::r_sexptype_traits< typename
> FUN::result_type>::rtype ;
>
>        Vector<RTYPE> ans = Vector<RTYPE>::import_transform(
>                object.begin(), object.end(), fun
>                ) ;
>        ans.names() = object.names() ;
>        return ans ;
> }
>
> template <typename T>
> struct square : std::unary_function<T,T> {
>        inline T operator()( const T& t ){ return t*t ; }
> } ;
>
> '
>
> code <- '
>        NumericVector xx( x) ;
>        return all( sapply( xx, square<double>() ) < 15 ) ;
>
> '
>
> fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
> fx( seq(1,10, length = 5 ) )
>
>
> This needs rev 1555 of Rcpp for the "< 15" part. I'll try to come up with a
> lazy version of this for the Rcpp sugar.
>
> I'll probably recycle some of this thread into a blog post (with nicer
> formatting).
>
>
> Also worth noting the existence of Rcpp::unary_call, Rcpp::fixed_call and
> Rcpp::binary_call that wrap up an R function in a strong type illusion
> (powered by the twins Rcpp::as and Rcpp::wrap).
>
> So for example (this will be less efficient for obvious reasons):
>
> code <- '
>        NumericVector xx( x) ;
>        return all( sapply( xx, unary_call<double,double>( Function("square")
> ) ) < 15 ) ;
>
> '
>
> square <- function(x) x^2
> fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
> fx( seq(1,10, length = 5 ) )
>
> Romain
>
> PS : More examples of these in the runit.Language.R file.
>
> Le 16/06/10 16:18, Romain Francois a écrit :
>>
>>
>> Le 16/06/10 14:58, Douglas Bates a écrit :
>>>
>>> Ever since I read Phil Spector's book on S I have been a fan of
>>> functional programming in S and R. When Jose Pinheiro and I were
>>> working on the nlme package there was a joke between us that you could
>>> tell which of us wrote which parts of the code because his parts
>>> always had an object named "aux" and my parts always had
>>> "unlist(lapply(list(...)))".
>>>
>>> Even within C++ I like to use the std:: algorithms like
>>> std::transform, but, of course, there are differences between a
>>> strongly typed language like C++ and a dynamically typed language like
>>> S. Templates can get around these differences to some extent but I am
>>> still a bit of a novice regarding templates.
>>>
>>> Currently I want to apply some functions to lists but entirely within
>>> the C++ code (i.e. I don't want to create Rcpp Function objects and
>>> call back to R). For the sake of argument, consider a function that
>>> extracts the lengths of the components of the lists.
>>>
>>>
>>> library(Rcpp)
>>> library(inline)
>>> inc<- '
>>> class length {
>>> public:
>>> R_len_t operator() (RObject const& x) {return Rf_length(SEXP(x));}
>>> };
>>> '
>>> code<- '
>>> List lst(ll);
>>> IntegerVector ans(lst.size());
>>> std::transform(lst.begin(), lst.end(), ans.begin(), length());
>>> return ans;
>>> '
>>> ltst<- cxxfunction(signature(ll = "list"), code, "Rcpp", inc)
>>> ll<- list(a = numeric(0), b = LETTERS[6:9], c = c)
>>> ltst(ll)
>>> sapply(ll, length)
>>>
>>> I would like to create a templated sapply function like
>>>
>>> template<int RTYPE>
>>> Vector<RTYPE> sapply(List ll, ??) {
>>> Vector<RTYPE> ans(ll.size());
>>> CharacterVector nms = ll.names();
>>> if (nms.size() == ll.size()) ans.names() = nms;
>>> std::transform(ll.begin(), ll.end(), ans.begin(), ??);
>>> return ans;
>>> }
>>>
>>> but I don't know how to specify the second argument that is a function
>>> that returns the atomic element type of a Vector<RTYPE> (is this as
>>> simple as Vector<RTYPE>::value_type?) and has a single argument which
>>> probably should be an RObject (although might be an SEXP, if that was
>>> more convenient). Can someone (probably Romain) provide some
>>> guidance?
>>
>> I recently added Vector::import_transform which might be all you need.
>>
>> NumericVector x = NumericVector::import_transform(
>> y.begin(), y.end(), f) ;
>>
>>
>> This will work as long as f is of some type acceptable by std::transform.
>>
>>
>> But let's keep going anyway. One way (I'll write later why this is not
>> fully satisfactory) is here:
>>
>> require( Rcpp )
>> require( inline )
>>
>> inc <- '
>>
>> template <typename OBJECT, typename FUN>
>> SEXP sapply(const OBJECT& object, FUN fun) {
>>
>> const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
>> FUN::result_type>::rtype ;
>>
>> Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
>> object.begin(), object.end(), fun
>> ) ;
>> return ans ;
>> }
>>
>> template <int RT, typename FUN>
>> SEXP sapply( const Rcpp::Vector<RT>& object, FUN fun){
>>
>> const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
>> FUN::result_type>::rtype ;
>>
>> Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
>> object.begin(), object.end(), fun
>> ) ;
>> ans.names() = object.names() ;
>> return ans ;
>> }
>>
>> template <typename T>
>> struct square : std::unary_function<T,T> {
>> inline T operator()( T& t ){ return t*t ; }
>> } ;
>>
>> '
>>
>> code <- '
>> NumericVector xx( x );
>>
>> return sapply( xx, square<double>() ) ;
>>
>> '
>>
>> fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
>>
>>
>>  > fx( seq(1,10, length = 5 ) )
>> [1] 1.0000 10.5625 30.2500 60.0625 100.0000
>>
>> I'll try to explain bit by bit. Please let me know if something is too
>> cryptic. The good thing about templates is that as long as they work
>> they don't complain. The bad thing is that when they start complaining,
>> you need large screens to accomodate all the warnings/erros you get...
>>
>>
>>
>> Starting from the last thing, the square struct.
>>
>> template <typename T>
>> struct square : std::unary_function<T,T> {
>> inline T operator()( const T& t ){ return t*t ; }
>> } ;
>>
>> we need the function that goes into sapply to help, i.e we need it to be
>> aware of the result type. inheriting from std::unary_function makes this
>> easy. It essentially just adds some typedef. See
>> http://www.cplusplus.com/reference/std/functional/unary_function/
>>
>> So when we do square<double>(), the function is self aware that it
>> returns a double. We need that to decide which kind of R vectors we want
>> to create. Which is the job of the Rcpp::traits::r_sexptype_traits
>> trait. So:
>>
>> const int RTYPE = Rcpp::traits::r_sexptype_traits< typename
>> FUN::result_type>::rtype ;
>>
>> it is very important that this is const, because we are using this as a
>> template argument later and this needs to be compile time constant.
>>
>> Then, we just delegate to std::transform through Vector::import_transform:
>>
>> Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
>> object.begin(), object.end(), fun
>> ) ;
>>
>>
>>
>> For now the sapply just return a SEXP because that is easy, but we could
>> have the result type deduced as well (fasten your seatbelt):
>>
>> template <typename OBJECT, typename FUN>
>> Rcpp::Vector< Rcpp::traits::r_sexptype_traits< typename
>> FUN::result_type>::rtype >
>> sapply(const OBJECT& object, FUN fun) {
>>
>>
>> Anyway, why is there two sapply ? because the first one is more generic,
>> you could sapply on a std::vector<double> for example, the only thing we
>> do with the object is to call begin() and end(), so as long as they
>> exist and they produce something that makes sense for std::transform, we
>> are fine (this is why templates are just so great).
>>
>> The second version is specific to Rcpp vectors, on which we can call
>> names().
>>
>>
>>
>> Now why is not satisfactory ? This is not lazy enough for me. With a
>> little bit more work, instead of returning a Rcpp::Vector which
>> allocates all of its memory right now (not lazy), we could have sapply
>> returning some sort of proxy object. The proxy object would expose a
>> similar interface : operator[] and iterator, but the result will only be
>> calculated when truly necessary.
>>
>> Why should we care about that ? Imagine, instead of just calculating the
>> square, we want to find out if all the squared values are below 15, ie.
>> the equivalent of the R code "all( x^2 < 15 )" ? We only need to
>> calculate three values to make our decision, so why bother calculating
>> the 2 last ones. (lazy)
>>
>> This is the kind of thing I want to bring to Rcpp with the "sugar" piece
>> of the puzzle.
>>
>> We would then have this expression:
>>
>> code <- '
>> Rcpp::NumericVector xx( x) ;
>>
>> return Rcpp::all( sapply( xx, square<double>() ) < 15 ) ;
>>
>> '
>>
>> It almost work right now (I need to add some code so that <15 makes
>> sense, I only added Vector < Vector so far), but currently requires to
>> compute all the 5 values before handling the data to all.
>>
>>
>> Please come back with questions if some gaps need to be filled.
>>
>> Romain
>>
>>
>
>
> --
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/98Uf7u : Rcpp 0.8.1
> |- http://bit.ly/c6YnCi : graph gallery collage
> `- http://bit.ly/bZ7ltC : inline 0.3.5
>
>
>


More information about the Rcpp-devel mailing list