[Rcpp-devel] Forcing a shallow versus deep copy

Fri Jul 12 12:41:01 CEST 2013

On Fri, Jul 12, 2013 at 1:42 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> On 11 July 2013 at 19:21, Gabor Grothendieck wrote:
> | 1. Just to be clear what we have been discussing here is not just how to
> | avoid copying but how to avoid copying while using as and wrap
> | or approaches that automatically generate as and wrap.  I was already
> | aware of how to avoid copying using Armadillo how to use Armadillo types
> | as arguments and return values to autogen as and wrap.  The problem is
> | not that but that these two things cannot be done at once - its either or.
>
> I must still be misunderstanding as this still reads to me as if you are
> suspecting that we somehow keep layers making extra copies.
>
> We're not. And I've known you long enough to know that you are not likely to
> suspect this either.  So what is it then?
>
> As Romain said, some of the choice have to do with the representation on both
> the R and C++ side -- for Rcpp itself we can be lightweight and efficient via
> proxy classes, but this does not mean we can do this for _any arbitrary C++
> class_ coming from another project. As eg Armadillo.  RcppArmadilo already
> does pretty well, and code review may make it better.  We do not know of any
> fat to cut, or we'd cut it ourselves.  We care about a few things, but
> performance is clearly among them.

I think Romain's proposal will clarify this.

>
> | 2. Regarding the quesiton of performance impact there are two situations
> | which should be distinguished:
> |
> | i. We call C++ from R and it does some processing and then returns and
> | we don't call it again. In that case its likely that copying or not won't
> | make a big difference or at least it won't if the actual C++ computation
> | time is large coimpared to the time spent in copying.
> |
> | ii. We factor out the inner loop of the code and only recode that in C++
> | and repeatedly call it many times.  In that case the copying is multiplied
> | by the number of iterations and might very well have a significant impact.
>
> In case ii) I'd try to use a different design and make it more like i): You
> generally do not want to call down from R to object code a bazillion times as
> there is always some overhead, and multiplying even something rather
> efficient by a veryBigNumber can make small times large in the aggregate.

Sure and sugar, rcpparmadillo and other facilities do make it easier to move
more functionality into C++; nevertheless, it can be the case that a relatively
small amount of R code repeatedly
invoked is responsible for the performance hit in a program and from
the viewpoint
of reducing complexity and increasing maintainability it can be
desirable to just
move that minimum portion to the C++ side minimizing the dual language aspect
of the code.  By making call overhead as fast
as one can while retaining any automatic Rcpp features then this
is facilitated.  If its not possible in general then if it were just possible
for Armadillo objects and selected other situations then this would
still be nice.

>
> Dirk
>
> |
> | On Thu, Jul 11, 2013 at 6:55 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
> | >
> | > Everybody has this existing example in their copy of Armadillo.
> | >
> | > I am running it here from SVN rather than the installed directory, but this
> | > should not make a difference. Machine is my not-overly-powerful thinkpad used
> | > for traveling:
> | >
> | > edd at don:~/svn/rcpp/pkg/RcppArmadillo/inst/examples$ r fastLm.r
> | > Loading required package: methods
> | >
> | > Attaching package: ‘Rcpp’
> | >
> | > The following object is masked from ‘package:inline’:
> | >
> | >     registerPlugin
> | >
> | >                        test replications relative elapsed user.self sys.self
> | > 2         fLmTwoCasts(X, y)         5000    1.000   0.184     0.204    0.164
> | > 1          fLmOneCast(X, y)         5000    1.011   0.186     0.200    0.172
> | > 4   fastLmPureDotCall(X, y)         5000    1.141   0.210     0.236    0.184
> | > 3          fastLmPure(X, y)         5000    2.027   0.373     0.412    0.332
> | > 6              lm.fit(X, y)         5000    2.685   0.494     0.528    0.456
> | > 5 fastLm(frm, data = trees)         5000   36.380   6.694     7.332    6.028
> | > 7     lm(frm, data = trees)         5000   42.734   7.863     8.628    7.068
> | > edd at don:~/svn/rcpp/pkg/RcppArmadillo/inst/examples$
> | >
> | > What we are talking about here is the difference between 'fLmTwoCasts' and
> | > 'fLmOneCasts'.  If you use larger objects, the different with be larger.  But
> | > the relative differences are tiny.
> | >
> | > It would be nice to make this more elegant, and I look forward to Romain's
> | > proposals, but methinks that we may well have bigger fish to fry.
> | >
> | > Dirk, still in Sydney
> | >
> | > --
> | > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
> | > _______________________________________________
> | > Rcpp-devel mailing list
> | > Rcpp-devel at lists.r-forge.r-project.org
> | > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> |
> |
> |
> | --
> | Statistics & Software Consulting
> | GKX Group, GKX Associates Inc.
> | tel: 1-877-GKX-GROUP
> | email: ggrothendieck at gmail.com
>
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com