[Rcpp-devel] function modifying it self its argument
Matt D.
matdzb at gmail.com
Thu Feb 26 21:28:25 CET 2015
On 2/26/2015 18:59, Dirk Eddelbuettel wrote:
> On 26 February 2015 at 18:35, Matt D. wrote:
> | Which incidentally brings me to the advice I usually give in these situations:
> | unless you're absolutely dependent on the "features" of `Rcpp::NumericVector`
> | just forget about it and replace all uses with the standard container
> | `std::vector<double>`.
>
> Note that this means you will always force a copy on the way in, and on the
> way out. That is a guaranteed performance penalty.
>
> So with this you guarantee that someone else will always be able to write
> faster code. That said, I too like std::vector<>, but I also like arma::vec,
> and those are (in the recent versions) lightweight.
Sure!
In the realm of all possible general cases with a particular focus on
the use-cases not running into the discussed problem and not having to
use `clone`: Fair enough -- this is the usual point made when discussing
the advantages of shallow (over deep) copy semantics (or even the
copy-on-write in-between).
(In the case under consideration: Not avoidable, since `clone` already
does the copy.)
In general, it's certainly a reasonable point that there is a trade-off
to be made -- user-friendliness against the potential extra copies (not
sure whether this has ever been measured -- as in counting the cases of
`clone`-less existing code-bases where this was the actual performance
bottleneck).
However, it still violates the POLS even for the users coming from pure R:
> f = function(v) { u = w; if (length(u) > 1) { u[1] = 123 }; u }
> w = rep(1, 3)
> f(w)
[1] 123 1 1
> w
[1] 1 1 1
In the "general" scenario it's not really user-friendly to abandon R
(and well as C++) semantics by default.
Perhaps there's another solution -- continuing with the proxy aspect:
>
> | found" at the moment -- and, as mentioned in another reply, you're apparently
> | expected to Google around to find methods for solving problems you wouldn't
>
> We have called these object "proxy models" since almost certainly 2010. This
> is referenced in the standard introductory paper (published peer-reviewed in
> JSS in 2011) and included as a vignette in the package.
>
> If you ignore the avilable documentation, then you may indeed have to "google
> at random" as you claim. I'd call that a self-inflicted wound.
Sure: At the same time, to give a somewhat related example,
`std::vector<bool>` has been known to be a proxied container since at
least 1998 -- when the original (pre-standard) STL's implementation has
been partially adopted with the choice to specialize to what used to be
a `bit_vector` instead of following the usual container requirements
(with some earlier / pre-standard implementations available):
https://www.sgi.com/tech/stl/bit_vector.html
Just the same, as of 2013 programmers still weren't 100% clear on the
implications:
http://stackoverflow.com/questions/17794569/why-is-vectorbool-not-a-stl-container
I imagine blaming these programmers for "ignoring" ISO/IEC 14882 and
advising them to use a search engine after the failure to read it in its
entirety is certainly _an_ approach.
After all, this design has been also chosen with optimization in mind
(albeit with space-efficient allocation as the goal).
At the same time, nowadays the design choice made for
`std::vector<bool>` is referred to (variably) as "totally broken", a
"defect", a "mistake", or an "abomination":
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2160.html
https://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=98
http://isocpp.org/blog/2012/11/on-vectorbool
http://www.gotw.ca/publications/mill09.htm
Today `std::vector<bool>` is a textbook example of premature
optimization in design (http://www.gotw.ca/gotw/050.htm) -- with all the
usual caveats:
"std::vector<bool> forces a specific optimization on all users by
enshrining it in the standard. That's not a good idea; different users
have different requirements, and now all users of vector<bool> must pay
the performance penalty even if they don't want or need the space savings.
Bottom line: If you care more about speed than you do about size, you
shouldn't use std::vector<bool>. Instead, you should hack around this
optimization by using a std::vector<char> or the like instead, which is
unfortunate but still the best you can do."
Perhaps a better self-documenting code could attempt to help the users
by having, say, `Rcpp::NumericVectorView` (or
`Rcpp::NumericVectorProxy`) used for view (proxy) purposes -- and
sticking with the default (expected by R -- as well as C++ --
programmers) for `Rcpp::NumericVector`?
(Alternatively, making `f(Rcpp::NumericVector & v)` signify the need for
mutation, while keeping the expected copied-value behavior for
`f(Rcpp::NumericVector v)`; or is implementing this inherently blocked
by the way RCpp has to interoperate with R through SEXPs? Similarly for
`f(std::vector<double> & v)` vs `f(std::vector<double> v)` vs `f(const
std::vector<double> & v)`?).
As it stands, despite its name, `Rcpp::NumericVector` isn't really a
numeric vector. As you rightly point out, it is a view (or a proxy).
This is surprising for a type named `Rcpp::NumericVector`. I don't think
it's unreasonable for the users to ask questions given the source of the
astonishment. Just as it isn't surprising to see users confused about
`std::vector<bool>` some decades after its behavior has been
standardized. Rcpp is a relatively young project, perhaps this will
change over time...
The trade-off in general case of "what's the good default" seems to be
pitting copy-optimization against user-friendliness; the current
(reference semantics) approach presumes an unstated assumption that the
regular users will know about `Rcpp::NumericVector` being different (and
the need to `clone`) and that performance experts won't be capable of
optimizing their code if this isn't done for them.
Perhaps leaving regular users with a regularly behaving
`Rcpp::NumericVector` by default -- while leaving performance experts
the option to use `Rcpp::NumericVectorView` on as-needed basis -- would
cut the amount of help required in the first place?
Granted: to an extent this is all academic -- chances are there is some
code somewhere relying on this and this ship has sailed
(http://xkcd.com/1172/) (unless there's a potential for redesign /
backward incompatibility in the future).
That being said, as for the "what to do by default" advice, for anyone
finding themselves in a need to `clone` -- `std::vector<double>` seems
like the safer, better documented option.
Best,
Matt
>
> Dirk
>
More information about the Rcpp-devel
mailing list