[Rcpp-devel] Returning a named (and classed?) list from an RcppExport SEXP function

Romain Francois romain.francois at dbmail.com
Fri Mar 12 22:46:37 CET 2010


Le 12/03/10 22:00, Douglas Bates a écrit :
>
> On Fri, Mar 12, 2010 at 2:23 PM, Romain Francois
> <romain at r-enthusiasts.com>  wrote:
>> Le 12/03/10 18:23, Douglas Bates a écrit :
>>>
>>> On Fri, Mar 12, 2010 at 10:47 AM, Dirk Eddelbuettel<edd at debian.org>
>>>   wrote:
>>>>
>>>> Hi Doug,
>>>>
>>>> Welcome! We're honoured to have you here. And glad you picked the list.
>>>>
>>>> On 12 March 2010 at 10:14, Douglas Bates wrote:
>>>> | I have looked at your (i.e. Dirk and Roman's) submitted Rjournal
>>>> | article on Rcpp and decided that I should take the plunge.
>>>> |
>>>> | To get my feet wet I am starting with a relatively simple example
>>>> | interfacing to (God help us) 1960's style Fortran written by Mike
>>>> | Powell and incorporated in the minqa package.
>>>> |
>>>> | The value to be returned is a named list of heterogeneous values.  I
>>>> | can see how to construct such a list in the classic Rcpp API using a
>>>> | RcppResultSet object but I don't know the best way to accomplish this
>>>> | in the more modern Rcpp namespace.  Can someone point me in the right
>>>> | direction?  Also, is there a convenient idiom for attaching an S3
>>>> | class name to a list to be returned?
>>>>
>>>> Yes. See eg the examples I put on my blog, eg inside of RcppArmadillo we
>>>> simply do (including actual fastLm source here modulu two ifdefs dealing
>>>> with
>>>> older versions -- which means you need Armadillo 0.9.0 or later)
>>>>
>>>> #include<RcppArmadillo.h>
>>>>
>>>> extern "C" SEXP fastLm(SEXP ys, SEXP Xs) {
>>>>
>>>>     Rcpp::NumericVector yr(ys);                 // creates Rcpp vector
>>>> from SEXP
>>>>     Rcpp::NumericMatrix Xr(Xs);                 // creates Rcpp matrix
>>>> from SEXP
>>>>     int n = Xr.nrow(), k = Xr.ncol();
>>>>
>>>>     arma::mat X(Xr.begin(), n, k, false);       // reuses memory and
>>>> avoids extra copy
>>>>     arma::colvec y(yr.begin(), yr.size(), false);
>>>>
>>>>     arma::colvec coef = solve(X, y);            // fit model y ~ X
>>>>
>>>>     arma::colvec resid = y - X*coef;            // residuals
>>>>
>>>>     double sig2 = arma::as_scalar( trans(resid)*resid/(n-k) );
>>>>                                                 // std.error of estimate
>>>>     arma::colvec stderrest = sqrt( sig2 * diagvec(
>>>> arma::inv(arma::trans(X)*X)) );
>>>>
>>>>     Rcpp::Pairlist res(Rcpp::Named( "coefficients", coef),
>>>>                        Rcpp::Named( "stderr", stderrest));
>>>>     return res;
>>>> }
>>>>
>>>> [ By the way, I still think I should try to implement all four of your
>>>> "how
>>>> to do lm" solutions from the older R News / R Journal piece in Armadillo
>>>> one
>>>> day :-)   And see what the timing impacts would be. ]
>>>>
>>>> That works "on the fly" with Pairlist and named.  We also have Rcpp::List
>>>> examples, but AFAICR you need to 'predeclare' how long your list is and
>>>> the
>>>> set each element.  The 200+ unit tests will have examples. Better docs
>>>> will
>>>> appear one day...
>>>
>>> Nice approach.  I presume that the Pairlist object is converted to an
>>> VECSXP in the process of being packaged for return.
>>
>> No, they are shipped to R as a LISTSXP.
>
> Hmm.  That may cause problems in the long run.  IIRC, having LISTSXPs,
> other than those related to language objects, loose in R is frowned
> upon.  I'll check before saying anything definite.

I agree. Nobody would do something like this in R:

 > pairlist( foo = 1, bar = 2 )

so I would be in favour of giving less exposure to the Pairlist class. 
IIRC the only reason why Dirk uses them is because of the convenience of 
creating the object in one call... so we just need to have similar 
semantics with the List class.

>>> I will play
>>> around a bit with expressions like
>>>
>>> Rcpp::List res(Rcpp::PairList(Rcpp::Named(...), ...))
>>
>> This should work since the List( SEXP ) constructor uses coercion. Let us
>> know otherwise.
>
> I tried it and was terribly frustrated that my example didn't work and
> I couldn't decide why.  After much exploration I found that it had
> nothing to do with that construction and was in fact due to my using 6
> SEXPs in a Pairlist constructor.  Apparently my version is not
> compiled with HAS_VARIADIC_TEMPLATES set and that means the
> constructors only go up to 5.  I see now that because I am using g++
> 4.4.1 I should probably set the compiler flag -std=c++0x

This should enable c++0x :

RCPP_CXX0X="yes" R CMD INSTALL Rcpp

(we don't use the PKG_CXXFLAGS for this because otherwise R CMD check is 
unhappy about non portable flags)


Ah yes. 5 here is arbitrary, and it would not be too hard to make it 10, 
20 or something. Now that I am on OSX, I am stuck with gcc 4.2 :-( so no 
c++0x for me anymore (well only when I use my fedora machine). I took 
the precaution of wrapping the code bloat code into pseudo xml :

/* <code-bloat> */

/* </code-bloat> */

but wrote that manually. Should not be hard to come up with some R code 
to generate the code bloat... maybe tomorrow.

> I have tried it again initializing a list from the Pairlist object and
> that works.  Do you happen to know if that conversion determines the
> length of the result before allocating it?

We outsource this to the as.list function of R. This is probably not the 
easiest piece of code, and it surely deserve some more documentation. In 
a nutshell :

The constructor :

List(SEXP) aka Vector<VECSXP>( SEXP )

calls r_cast<VECSXP> which calls r_true_cast if the object is not a 
VECSXP, which looks like this:

template<> SEXP r_true_cast<VECSXP>(SEXP x){
	return convert_using_rfunction(x, "as.list" ) ;
}

This is somewhat lazy and we could to some switch( TYPEOF( )) to keep 
some things internal for the easy cases (LISTSXP, EXPRSXP, LANGSXP) and 
use as.list for the other cases to avoid the price of the round trip to 
the R side.

> I tried to read the code
> for internal::ListInitialization but I am not sufficiently fluent in
> C++ to decide what is going on.

ListInitialization might not be what you think. It is only there to 
support this notation.

NumericVector x(4) ;
x = 1.0, 2.0, 3.0, 4.0 ;

This is a trick we borrow from blitz++

but C++0x gives this notation that I find much nicer:

NumericVector x = {1.0, 2.0, 3.0; 4.0} ;


>>> I would like to have the capability of attaching an attribute named
>>> "class" to the VECSXP before returning it.
>>
>> res.attr( "class" ) = "foo" ;
>>
>>>> | An unrelated question -- has anyone looked at accessing S4 classed
>>>> | objects in Rcpp?  I have had considerable experience with S4 classes
>>>> | and objects and could keep such a project in mind as I begin to
>>>> | explore Rcpp.
>>>>
>>>> I'd be happy to go there (time permitting) but very gently as I am still
>>>> much
>>>> more firmly in S3 land.  Romain may be more eager :)
>>>
>>> One of the nice properties of S4 classed objects is that you can
>>> define validity checks for classes and bypass some of the code that
>>> does validity checking of arguments.
>>
>> For now the only things we have in the api to deal with S4'ness is these
>> methods : "isS4", "slot", "hasSlot".
>>
>> Not sure right now how we could formalize S4 objects in the api, but this
>> would definitely be a fine addition.
>>
>>>> | Yet another question, if i may.   Does coercion take place in
>>>> | constructing an object like
>>>> |
>>>> | Rcpp::NumericVector aa(a);
>>>> |
>>>> | where a is an SEXP, say an SEXP argument to a C++ function, where I
>>>> | accidentally passed an integer vector instead of a double vector?  If
>>>>
>>>> Very much so.  Again, see my blog and the RcppVectorExample code.  Int
>>>> in,
>>>> Double out. All magic hidden.
>>>>
>>>> And of course you can just try it:
>>>>
>>>> R>    library(RcppExamples)
>>>> Loading required package: Rcpp
>>>> R>    RcppVectorExample(c(1,4,9), api="new")
>>>> $result
>>>> [1] 1 2 3
>>>>
>>>> $original
>>>> [1] 1 4 9
>>>>
>>>> R>    RcppVectorExample(c(2,5,10), api="new")
>>>> $result
>>>> [1] 1.414 2.236 3.162
>>>>
>>>> $original
>>>> [1]  2  5 10
>>>>
>>>> R>
>>>
>>> Perhaps you are assuming that c(1,4,9) generates an integer vector in
>>> this example.  In fact, c(1,4,9) is a numeric (in the sense of double
>>> precision) vector.  You need to write c(1L, 4L, 9L) to ensure you have
>>> an integer vector or coerce it with as.integer().
>>>
>>>> At this level, behaviour is the same with 'classic' API but we know of at
>>>> least one case where 'classic' doesn't cast a Matrix right and 'new'
>>>> does.
>>>>
>>>> | so, what happens if there is no coercion?  Are the contents of the
>>>> | SEXP copied to newly allocated storage or not?
>>>>
>>>> It depends. In the new API we do fewer copies.
>>>>
>>>> | The reason that I ask is because there are some places in the lme4
>>>> | code where I want to make sure that I have a pointer to the contents
>>>> | of the original vector because I am want to change the contents
>>>> | without copying.
>>>>
>>>> I think we do that. If you look at the vignette and the timing example --
>>>> but
>>>> using pointer accessors (eg double *x = foo.begin()) and pointer ops (eg
>>>> x++)
>>>> we get essentially the same speed as pure and ugly C code.  Because we do
>>>> away with the copy, and the operator[], for purest speed.
>>>
>>> Yes, I like that idea of being able to use foo.begin() to access the
>>> pointer.  Part of my question related to whether I could count on
>>> getting the pointer to the original contents from R.
>>
>> Yes. The objects of the new api have the SEXP as their data member and
>> methods applied on objects are relayed back to the SEXP in terms of calls t
>> the R API. We use R memory management all the way.
>>
>> There are some times though where memory is reallocated, because there is no
>> other way. for example if you add values to a vector (push_back, push_front,
>> etc ...)
>>
>>   In some way I
>>>
>>> would need to check whether the R object had been coerced, and hence
>>> allocated new storage for its contents, or not.  That's not an
>>> immediate problem, though.
>>
>> There is something that might be helpful (or not). In the file RcppCommon.h,
>> we have the RCPP_RETURN_VECTOR macro, this is an initial attempt to avoid
>> writing the same switch( TYPEOF(x), case REALSXP: ... code over and over
>> again. There is an example use in RcppCommon.cpp
>>
>>
>>> Thanks for the response.
>>>
>>>> Hope that answer the first batch of questions.  Keep'em coming!
>>>>
>>>> Dirk



-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7
`- http://tr.im/O1wO : highlight 0.1-5




More information about the Rcpp-devel mailing list