[Rcpp-devel] Rcpp internal versions of lapply and sapply

Douglas Bates bates at stat.wisc.edu
Wed Jun 16 18:48:56 CEST 2010


By the way, you were correct in your original response that all I
really need is input_transform but it is certainly a worthwhile
exercise to see how traits, etc. are used and I appreciate your kind
explanations.

On Wed, Jun 16, 2010 at 10:58 AM, Romain Francois
<romain at r-enthusiasts.com> wrote:
> Le 16/06/10 17:44, Douglas Bates a écrit :
>>
>> Thank you very much, Romain.  This certainly helps my understanding of
>> unary_function, traits, etc.  These are the sorts of things I can read
>> about but until I see a non-trivial usage it is hard to keep these
>> ideas straight.
>>
>> Two minor questions occur to me at this point.  In my code I assigned
>> the names to be the names of the input list but I think now that I
>> should have assigned a clone of the object names instead.
>
> I have been lazy (the wrong kind of lazyness) when implementing names<-, and
> just used a callback to the R function names<- . The relevant chunk is in
> Vector.h in the NamesProxy class:
>
>                void set(SEXP x) const {
>                        SEXP new_vec = PROTECT( internal::try_catch(
>                        Rf_lcons( Rf_install("names<-"),
>                                        Rf_cons( parent, Rf_cons( x ,
> R_NilValue) )))) ;
>                        /* names<- makes a new vector, so we have to change
>                           the SEXP of the parent of this proxy, it might be
>                           worth to work directly with the names attribute
> instead
>                           of using the names<- R function, but then we need
> to
>                           take care of coercion, recycling, etc ... we
> cannot just
>                           brutally assign the names attribute */
>                        const_cast<Vector&>(parent).setSEXP( new_vec ) ;
>                        UNPROTECT(1) ; /* new_vec */
>                }
>
> So with this implementation, you don't need to worry about copying names.
> But it might not be the best implementation.
>
>> To be safe
>> I should have cloned before assigning, right?  I also checked the
>> length before assigning but you didn't.  Am I reflecting too much of
>> my C programming background here?  In other words, is there some magic
>> in the Rcpp names assignment that does the checking?
>
> The magic comes from R.
>
>> x <- 1:10
>> names(x) <- c("a", "b")
>> x
>   a    b <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
>   1    2    3    4    5    6    7    8    9   10
>
> In your example ll.names() is either NULL or a vector of the appropriate
> size. I think (not sure though) that names<- is restrictive about the rhs:
>
>> names(x) <- rnorm
> Erreur dans as.character(function (n, mean = 0, sd = 1)  :
>  cannot coerce type 'closure' to vector of type 'character'
>
> Romain
>
>> Thanks again.
>>
>> On Wed, Jun 16, 2010 at 9:55 AM, Romain Francois
>> <romain.francois at dbmail.com>  wrote:
>>>
>>> Small update incorporating the use of all (and using namespace Rcpp to
>>> make
>>> things slightly less cluterred).
>>>
>>> require( Rcpp )
>>> require( inline )
>>>
>>> inc<- '
>>>
>>> using namespace Rcpp ;
>>>
>>> template<typename OBJECT, typename FUN>
>>> Vector<  traits::r_sexptype_traits<  typename FUN::result_type>::rtype>
>>> sapply(const OBJECT&  object, FUN fun) {
>>>
>>>        const int RTYPE = traits::r_sexptype_traits<  typename
>>> FUN::result_type>::rtype ;
>>>
>>>        Vector<RTYPE>  ans = Vector<RTYPE>::import_transform(
>>>                object.begin(), object.end(), fun
>>>                ) ;
>>>        return ans ;
>>> }
>>>
>>> template<int RT, typename FUN>
>>> Vector<  traits::r_sexptype_traits<  typename FUN::result_type>::rtype>
>>> sapply( const Vector<RT>&  object, FUN fun){
>>>
>>>        const int RTYPE = traits::r_sexptype_traits<  typename
>>> FUN::result_type>::rtype ;
>>>
>>>        Vector<RTYPE>  ans = Vector<RTYPE>::import_transform(
>>>                object.begin(), object.end(), fun
>>>                ) ;
>>>        ans.names() = object.names() ;
>>>        return ans ;
>>> }
>>>
>>> template<typename T>
>>> struct square : std::unary_function<T,T>  {
>>>        inline T operator()( const T&  t ){ return t*t ; }
>>> } ;
>>>
>>> '
>>>
>>> code<- '
>>>        NumericVector xx( x) ;
>>>        return all( sapply( xx, square<double>() )<  15 ) ;
>>>
>>> '
>>>
>>> fx<- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
>>> fx( seq(1,10, length = 5 ) )
>>>
>>>
>>> This needs rev 1555 of Rcpp for the "<  15" part. I'll try to come up
>>> with a
>>> lazy version of this for the Rcpp sugar.
>>>
>>> I'll probably recycle some of this thread into a blog post (with nicer
>>> formatting).
>>>
>>>
>>> Also worth noting the existence of Rcpp::unary_call, Rcpp::fixed_call and
>>> Rcpp::binary_call that wrap up an R function in a strong type illusion
>>> (powered by the twins Rcpp::as and Rcpp::wrap).
>>>
>>> So for example (this will be less efficient for obvious reasons):
>>>
>>> code<- '
>>>        NumericVector xx( x) ;
>>>        return all( sapply( xx, unary_call<double,double>(
>>> Function("square")
>>> ) )<  15 ) ;
>>>
>>> '
>>>
>>> square<- function(x) x^2
>>> fx<- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
>>> fx( seq(1,10, length = 5 ) )
>>>
>>> Romain
>>>
>>> PS : More examples of these in the runit.Language.R file.
>>>
>>> Le 16/06/10 16:18, Romain Francois a écrit :
>>>>
>>>>
>>>> Le 16/06/10 14:58, Douglas Bates a écrit :
>>>>>
>>>>> Ever since I read Phil Spector's book on S I have been a fan of
>>>>> functional programming in S and R. When Jose Pinheiro and I were
>>>>> working on the nlme package there was a joke between us that you could
>>>>> tell which of us wrote which parts of the code because his parts
>>>>> always had an object named "aux" and my parts always had
>>>>> "unlist(lapply(list(...)))".
>>>>>
>>>>> Even within C++ I like to use the std:: algorithms like
>>>>> std::transform, but, of course, there are differences between a
>>>>> strongly typed language like C++ and a dynamically typed language like
>>>>> S. Templates can get around these differences to some extent but I am
>>>>> still a bit of a novice regarding templates.
>>>>>
>>>>> Currently I want to apply some functions to lists but entirely within
>>>>> the C++ code (i.e. I don't want to create Rcpp Function objects and
>>>>> call back to R). For the sake of argument, consider a function that
>>>>> extracts the lengths of the components of the lists.
>>>>>
>>>>>
>>>>> library(Rcpp)
>>>>> library(inline)
>>>>> inc<- '
>>>>> class length {
>>>>> public:
>>>>> R_len_t operator() (RObject const&  x) {return Rf_length(SEXP(x));}
>>>>> };
>>>>> '
>>>>> code<- '
>>>>> List lst(ll);
>>>>> IntegerVector ans(lst.size());
>>>>> std::transform(lst.begin(), lst.end(), ans.begin(), length());
>>>>> return ans;
>>>>> '
>>>>> ltst<- cxxfunction(signature(ll = "list"), code, "Rcpp", inc)
>>>>> ll<- list(a = numeric(0), b = LETTERS[6:9], c = c)
>>>>> ltst(ll)
>>>>> sapply(ll, length)
>>>>>
>>>>> I would like to create a templated sapply function like
>>>>>
>>>>> template<int RTYPE>
>>>>> Vector<RTYPE>  sapply(List ll, ??) {
>>>>> Vector<RTYPE>  ans(ll.size());
>>>>> CharacterVector nms = ll.names();
>>>>> if (nms.size() == ll.size()) ans.names() = nms;
>>>>> std::transform(ll.begin(), ll.end(), ans.begin(), ??);
>>>>> return ans;
>>>>> }
>>>>>
>>>>> but I don't know how to specify the second argument that is a function
>>>>> that returns the atomic element type of a Vector<RTYPE>  (is this as
>>>>> simple as Vector<RTYPE>::value_type?) and has a single argument which
>>>>> probably should be an RObject (although might be an SEXP, if that was
>>>>> more convenient). Can someone (probably Romain) provide some
>>>>> guidance?
>>>>
>>>> I recently added Vector::import_transform which might be all you need.
>>>>
>>>> NumericVector x = NumericVector::import_transform(
>>>> y.begin(), y.end(), f) ;
>>>>
>>>>
>>>> This will work as long as f is of some type acceptable by
>>>> std::transform.
>>>>
>>>>
>>>> But let's keep going anyway. One way (I'll write later why this is not
>>>> fully satisfactory) is here:
>>>>
>>>> require( Rcpp )
>>>> require( inline )
>>>>
>>>> inc<- '
>>>>
>>>> template<typename OBJECT, typename FUN>
>>>> SEXP sapply(const OBJECT&  object, FUN fun) {
>>>>
>>>> const int RTYPE = Rcpp::traits::r_sexptype_traits<  typename
>>>> FUN::result_type>::rtype ;
>>>>
>>>> Rcpp::Vector<RTYPE>  ans = Rcpp::Vector<RTYPE>::import_transform(
>>>> object.begin(), object.end(), fun
>>>> ) ;
>>>> return ans ;
>>>> }
>>>>
>>>> template<int RT, typename FUN>
>>>> SEXP sapply( const Rcpp::Vector<RT>&  object, FUN fun){
>>>>
>>>> const int RTYPE = Rcpp::traits::r_sexptype_traits<  typename
>>>> FUN::result_type>::rtype ;
>>>>
>>>> Rcpp::Vector<RTYPE>  ans = Rcpp::Vector<RTYPE>::import_transform(
>>>> object.begin(), object.end(), fun
>>>> ) ;
>>>> ans.names() = object.names() ;
>>>> return ans ;
>>>> }
>>>>
>>>> template<typename T>
>>>> struct square : std::unary_function<T,T>  {
>>>> inline T operator()( T&  t ){ return t*t ; }
>>>> } ;
>>>>
>>>> '
>>>>
>>>> code<- '
>>>> NumericVector xx( x );
>>>>
>>>> return sapply( xx, square<double>() ) ;
>>>>
>>>> '
>>>>
>>>> fx<- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
>>>>
>>>>
>>>>  >  fx( seq(1,10, length = 5 ) )
>>>> [1] 1.0000 10.5625 30.2500 60.0625 100.0000
>>>>
>>>> I'll try to explain bit by bit. Please let me know if something is too
>>>> cryptic. The good thing about templates is that as long as they work
>>>> they don't complain. The bad thing is that when they start complaining,
>>>> you need large screens to accomodate all the warnings/erros you get...
>>>>
>>>>
>>>>
>>>> Starting from the last thing, the square struct.
>>>>
>>>> template<typename T>
>>>> struct square : std::unary_function<T,T>  {
>>>> inline T operator()( const T&  t ){ return t*t ; }
>>>> } ;
>>>>
>>>> we need the function that goes into sapply to help, i.e we need it to be
>>>> aware of the result type. inheriting from std::unary_function makes this
>>>> easy. It essentially just adds some typedef. See
>>>> http://www.cplusplus.com/reference/std/functional/unary_function/
>>>>
>>>> So when we do square<double>(), the function is self aware that it
>>>> returns a double. We need that to decide which kind of R vectors we want
>>>> to create. Which is the job of the Rcpp::traits::r_sexptype_traits
>>>> trait. So:
>>>>
>>>> const int RTYPE = Rcpp::traits::r_sexptype_traits<  typename
>>>> FUN::result_type>::rtype ;
>>>>
>>>> it is very important that this is const, because we are using this as a
>>>> template argument later and this needs to be compile time constant.
>>>>
>>>> Then, we just delegate to std::transform through
>>>> Vector::import_transform:
>>>>
>>>> Rcpp::Vector<RTYPE>  ans = Rcpp::Vector<RTYPE>::import_transform(
>>>> object.begin(), object.end(), fun
>>>> ) ;
>>>>
>>>>
>>>>
>>>> For now the sapply just return a SEXP because that is easy, but we could
>>>> have the result type deduced as well (fasten your seatbelt):
>>>>
>>>> template<typename OBJECT, typename FUN>
>>>> Rcpp::Vector<  Rcpp::traits::r_sexptype_traits<  typename
>>>> FUN::result_type>::rtype>
>>>> sapply(const OBJECT&  object, FUN fun) {
>>>>
>>>>
>>>> Anyway, why is there two sapply ? because the first one is more generic,
>>>> you could sapply on a std::vector<double>  for example, the only thing
>>>> we
>>>> do with the object is to call begin() and end(), so as long as they
>>>> exist and they produce something that makes sense for std::transform, we
>>>> are fine (this is why templates are just so great).
>>>>
>>>> The second version is specific to Rcpp vectors, on which we can call
>>>> names().
>>>>
>>>>
>>>>
>>>> Now why is not satisfactory ? This is not lazy enough for me. With a
>>>> little bit more work, instead of returning a Rcpp::Vector which
>>>> allocates all of its memory right now (not lazy), we could have sapply
>>>> returning some sort of proxy object. The proxy object would expose a
>>>> similar interface : operator[] and iterator, but the result will only be
>>>> calculated when truly necessary.
>>>>
>>>> Why should we care about that ? Imagine, instead of just calculating the
>>>> square, we want to find out if all the squared values are below 15, ie.
>>>> the equivalent of the R code "all( x^2<  15 )" ? We only need to
>>>> calculate three values to make our decision, so why bother calculating
>>>> the 2 last ones. (lazy)
>>>>
>>>> This is the kind of thing I want to bring to Rcpp with the "sugar" piece
>>>> of the puzzle.
>>>>
>>>> We would then have this expression:
>>>>
>>>> code<- '
>>>> Rcpp::NumericVector xx( x) ;
>>>>
>>>> return Rcpp::all( sapply( xx, square<double>() )<  15 ) ;
>>>>
>>>> '
>>>>
>>>> It almost work right now (I need to add some code so that<15 makes
>>>> sense, I only added Vector<  Vector so far), but currently requires to
>>>> compute all the 5 values before handling the data to all.
>>>>
>>>>
>>>> Please come back with questions if some gaps need to be filled.
>>>>
>>>> Romain
>
> --
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/98Uf7u : Rcpp 0.8.1
> |- http://bit.ly/c6YnCi : graph gallery collage
> `- http://bit.ly/bZ7ltC : inline 0.3.5
>
>


More information about the Rcpp-devel mailing list