[Rcpp-devel] function modifying it self its argument

Matt D. matdzb at gmail.com
Thu Feb 26 21:28:25 CET 2015


On 2/26/2015 18:59, Dirk Eddelbuettel wrote:
> On 26 February 2015 at 18:35, Matt D. wrote:
> | Which incidentally brings me to the advice I usually give in these situations:
> | unless you're absolutely dependent on the "features" of `Rcpp::NumericVector`
> | just forget about it and replace all uses with the standard container
> | `std::vector<double>`.
>
> Note that this means you will always force a copy on the way in, and on the
> way out.  That is a guaranteed performance penalty.
>
> So with this you guarantee that someone else will always be able to write
> faster code.  That said, I too like std::vector<>, but I also like arma::vec,
> and those are (in the recent versions) lightweight.
Sure!
In the realm of all possible general cases with a particular focus on 
the use-cases not running into the discussed problem and not having to 
use `clone`: Fair enough -- this is the usual point made when discussing 
the advantages of shallow (over deep) copy semantics (or even the 
copy-on-write in-between).
(In the case under consideration: Not avoidable, since `clone` already 
does the copy.)

In general, it's certainly a reasonable point that there is a trade-off 
to be made -- user-friendliness against the potential extra copies (not 
sure whether this has ever been measured -- as in counting the cases of 
`clone`-less existing code-bases where this was the actual performance 
bottleneck).

However, it still violates the POLS even for the users coming from pure R:
 > f = function(v) { u = w; if (length(u) > 1) { u[1] = 123 }; u }
 > w = rep(1, 3)
 > f(w)
[1] 123   1   1
 > w
[1] 1 1 1

In the "general" scenario it's not really user-friendly to abandon R 
(and well as C++) semantics by default.
Perhaps there's another solution -- continuing with the proxy aspect:
>
> | found" at the moment -- and, as mentioned in another reply, you're apparently
> | expected to Google around to find methods for solving problems you wouldn't
>
> We have called these object "proxy models" since almost certainly 2010.  This
> is referenced in the standard introductory paper (published peer-reviewed in
> JSS in 2011) and included as a vignette in the package.
>
> If you ignore the avilable documentation, then you may indeed have to "google
> at random" as you claim.  I'd call that a self-inflicted wound.
Sure: At the same time, to give a somewhat related example, 
`std::vector<bool>` has been known to be a proxied container since at 
least 1998 -- when the original (pre-standard) STL's implementation has 
been partially adopted with the choice to specialize to what used to be 
a `bit_vector` instead of following the usual container requirements 
(with some earlier / pre-standard implementations available): 
https://www.sgi.com/tech/stl/bit_vector.html

Just the same, as of 2013 programmers still weren't 100% clear on the 
implications:
http://stackoverflow.com/questions/17794569/why-is-vectorbool-not-a-stl-container
I imagine blaming these programmers for "ignoring" ISO/IEC 14882 and 
advising them to use a search engine after the failure to read it in its 
entirety is certainly _an_ approach.
After all, this design has been also chosen with optimization in mind 
(albeit with space-efficient allocation as the goal).

At the same time, nowadays the design choice made for 
`std::vector<bool>` is referred to (variably) as "totally broken", a 
"defect", a "mistake", or an "abomination":
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2160.html
https://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=98
http://isocpp.org/blog/2012/11/on-vectorbool
http://www.gotw.ca/publications/mill09.htm

Today `std::vector<bool>` is a textbook example of premature 
optimization in design (http://www.gotw.ca/gotw/050.htm) -- with all the 
usual caveats:

"std::vector<bool> forces a specific optimization on all users by 
enshrining it in the standard. That's not a good idea; different users 
have different requirements, and now all users of vector<bool> must pay 
the performance penalty even if they don't want or need the space savings.

Bottom line: If you care more about speed than you do about size, you 
shouldn't use std::vector<bool>. Instead, you should hack around this 
optimization by using a std::vector<char> or the like instead, which is 
unfortunate but still the best you can do."

Perhaps a better self-documenting code could attempt to help the users 
by having, say, `Rcpp::NumericVectorView` (or 
`Rcpp::NumericVectorProxy`) used for view (proxy) purposes -- and 
sticking with the default (expected by R -- as well as C++ -- 
programmers) for `Rcpp::NumericVector`?

(Alternatively, making `f(Rcpp::NumericVector & v)` signify the need for 
mutation, while keeping the expected copied-value behavior for 
`f(Rcpp::NumericVector v)`; or is implementing this inherently blocked 
by the way RCpp has to interoperate with R through SEXPs? Similarly for 
`f(std::vector<double> & v)` vs `f(std::vector<double> v)` vs `f(const 
std::vector<double> & v)`?).

As it stands, despite its name, `Rcpp::NumericVector` isn't really a 
numeric vector. As you rightly point out, it is a view (or a proxy). 
This is surprising for a type named `Rcpp::NumericVector`. I don't think 
it's unreasonable for the users to ask questions given the source of the 
astonishment. Just as it isn't surprising to see users confused about 
`std::vector<bool>` some decades after its behavior has been 
standardized. Rcpp is a relatively young project, perhaps this will 
change over time...

The trade-off in general case of "what's the good default" seems to be 
pitting copy-optimization against user-friendliness; the current 
(reference semantics) approach presumes an unstated assumption that the 
regular users will know about `Rcpp::NumericVector` being different (and 
the need to `clone`) and that performance experts won't be capable of 
optimizing their code if this isn't done for them.

Perhaps leaving regular users with a regularly behaving 
`Rcpp::NumericVector` by default -- while leaving performance experts 
the option to use `Rcpp::NumericVectorView` on as-needed basis -- would 
cut the amount of help required in the first place?

Granted: to an extent this is all academic -- chances are there is some 
code somewhere relying on this and this ship has sailed 
(http://xkcd.com/1172/) (unless there's a potential for redesign / 
backward incompatibility in the future).

That being said, as for the "what to do by default" advice, for anyone 
finding themselves in a need to `clone` -- `std::vector<double>` seems 
like the safer, better documented option.

Best,

Matt
>
> Dirk
>



More information about the Rcpp-devel mailing list