[Rcpp-devel] function modifying it self its argument

Nathan Kurz nate at verse.com
Thu Feb 26 23:28:54 CET 2015


On Thu, Feb 26, 2015 at 12:28 PM, Matt D. <matdzb at gmail.com> wrote:
> On 2/26/2015 18:59, Dirk Eddelbuettel wrote:
>>
>> On 26 February 2015 at 18:35, Matt D. wrote:
>> | Which incidentally brings me to the advice I usually give in these
>> situations:
>> | unless you're absolutely dependent on the "features" of
>> `Rcpp::NumericVector`
>> | just forget about it and replace all uses with the standard container
>> | `std::vector<double>`.
>>
>> Note that this means you will always force a copy on the way in, and on
>> the
>> way out.  That is a guaranteed performance penalty.
>
> Perhaps a better self-documenting code could attempt to help the users by
> having, say, `Rcpp::NumericVectorView` (or `Rcpp::NumericVectorProxy`) used
> for view (proxy) purposes -- and sticking with the default (expected by R --
> as well as C++ -- programmers) for `Rcpp::NumericVector`?

>From the perspective of a performance oriented C programmer using Rcpp
to improve performance of R algorithms, Matt's approach would be
completely backward to my expectation.   Optimized computation is
cheap enough to rarely be a bottleneck, but memory access is
expensive.  Garbage collection is one of R's weakest spots.   Almost
my entire reason for dropping to Rcpp is to avoid unnecessary copies.

But I agree with Matt that things could be clearer.  The main issue I
have is the interaction with R's "multiple names" behavior.   If the
vector is already multi-named, a copy is silently made as part of
passing it.  This is "correct" if you are making changes in place, but
if you are calling a function that operates read-only on a multi-GB
vector in a loop, this can destroy performance.

The best solution I've found is to proactively make a single copy
outside the loop by doing some sort of null-op (like A = A + 0), but
this certainly feels hackish.   And unless you do this everywhere, it
doesn't solve the problem that a small change elsewhere in your
program can drastically alter performance in the critical section.

> (Alternatively, making `f(Rcpp::NumericVector & v)` signify the need for
> mutation, while keeping the expected copied-value behavior for
> `f(Rcpp::NumericVector v)`; or is implementing this inherently blocked by
> the way RCpp has to interoperate with R through SEXPs? Similarly for
> `f(std::vector<double> & v)` vs `f(std::vector<double> v)` vs `f(const
> std::vector<double> & v)`?).

I like this approach.  Even better would be if it paid attention to
'const' such that no copy was made even if the R function call rules
would normally require it:

f(Rcpp::NumericVector &v):
  no copy unless multi-named SEXP

f(Rcpp::NumericVector v) :
  copy unless no outside references possible (temp var)

f(Rcpp::NumericVector const v):
  read only, no copy ever, blows up if contract violated

In my world, it would be wonderful if the "&v" version had an option
that either printed a warning or returned an error if a copy was
required to pass the argument.   I don't know if "const &v" could be
shoehorned into providing this, or if it would be clear enough.

--nate


More information about the Rcpp-devel mailing list