[Rcpp-devel] NumericVector Double mismatch when indexing an array

Jeffrey Pollock jeffpollock9 at gmail.com
Tue Sep 25 22:38:52 CEST 2012


Indeed, changing the sample size from 5 to 500 results in;

> benchmark(fun1(), fun2(), fun3(), fun4(), order = "relative",
replications = 1e5L)
    test replications elapsed relative user.self sys.self user.child
sys.child
4 fun4()       100000    6.16    1.000      6.06        0         NA
NA
3 fun3()       100000    6.19    1.005      5.91        0         NA
NA
2 fun2()       100000    6.92    1.123      6.75        0         NA
NA
1 fun1()       100000   13.01    2.112     12.73        0         NA
NA

On Tue, Sep 25, 2012 at 9:33 PM, Goldfeld, Keith
<Keith.Goldfeld at nyumc.org>wrote:

>  There does appear to be a difference between the native Rf_rnorm and the
> R rnorm function when the sample sizes get large.  In fact, the Rf_rnorm
> appears to be about twice as fast (using benchmark).  Not surprisingly,
> doing a loop in R is about 50 times slower.****
>
> ** **
>
> *From:* Jonathan Olmsted [mailto:jpolmsted at gmail.com]
> *Sent:* Tuesday, September 25, 2012 4:29 PM
> *To:* Jeffrey Pollock
>
> *Cc:* Goldfeld, Keith; rcpp-devel at lists.r-forge.r-project.org
> *Subject:* Re: [Rcpp-devel] NumericVector Double mismatch when indexing
> an array****
>
> ** **
>
> Jeff and List,****
>
> ** **
>
> I was interested in the same fairly recently. The full shebang is here <
> http://www.rochester.edu/college/gradstudents/jolmsted/blog/2012/08/22/rcpp-rng-performance/>.
> I, unlike you, found some difference between the various approaches.****
>
> ** **
>
> If I had to guess why we come up with different results, I'd say that the
> time for any one call for any of your inline functions is 90% overhead and
> the differences between the various approaches, while present, is on a
> vastly different (and smaller) scale than the overhead itself due to only 5
> draws being taken. Of course, how any of these timings inform one's
> approach depends on the specifics of how they'll set up their analysis.***
> *
>
> ** **
>
> For what it is worth, instead of an explicit "as<>()" conversion, I
> usually pull out the first element of the returned NumericVector. That is,
> ****
>
> rn[i] = as<double>(rnorm(1, 0.0, 1.0));****
>
> becomes****
>
> rn[i] = rnorm(1, 0.0, 1.0)[0];****
>
> I haven't ever timed this, though.****
>
> ** **
>
> JPO****
>
> ** **
>
> ** **
>
> ** **
>
> -------------------------------------------------------------------------
> J. P. Olmsted
> j.p.olmsted at gmail.com
> http://about.me/olmjo
> -------------------------------------------------------------------------
>
>
>
>
> ****
>
> On Tue, Sep 25, 2012 at 4:08 PM, Jeffrey Pollock <jeffpollock9 at gmail.com>
> wrote:****
>
> I was interested to see if there was any real speed difference between the
> different methods suggested, and it looks like... there isn't...
>
> library(inline)
> library(Rcpp)
> library(rbenchmark)
>
> fun1 <- cxxfunction(body = '
>                 RNGScope scope;
>                 NumericVector rn(5);
>                 for (int i = 0; i < 5; i++) {
>                     rn[i] = as<double>(rnorm(1, 0.0, 1.0));
>                 }
>                 return rn;
>                 ', plugin = "Rcpp")
>
> fun2 <- cxxfunction(body = '
>                 RNGScope scope;
>                 NumericVector rn(5);
>                 for (int i = 0; i < 5; i++) {
>                     rn[i] = Rf_rnorm(0.0, 1.0);
>                 }
>                 return rn;
>                 ', plugin = "Rcpp")
>
> fun3 <- cxxfunction(body = '
>                 RNGScope scope;
>                 NumericVector rn(5);
>                 for (int i = 0; i < 5; i++) {
>                     rn[i] = norm_rand();
>                 }
>                 return rn;
>                 ', plugin = "Rcpp")
>
> fun4 <- cxxfunction(body = '
>                 RNGScope scope;
>                 return rnorm(5, 0.0, 1.0);
>                 ', plugin = "Rcpp")
>
> set.seed(1)
> print(fun1())
>
> set.seed(1)
> print(fun2())
>
> set.seed(1)
> print(fun3())
>
> set.seed(1)
> print(fun4())
>
> benchmark(fun1(), fun2(), fun3(), fun4(), order = "relative", replications
> = 1e6L)
>
> output;
>
> > set.seed(1)
> > print(fun1())
> [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
>
> > set.seed(1)
> > print(fun2())
> [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
>
> > set.seed(1)
> > print(fun3())
> [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
>
> > set.seed(1)
> > print(fun4())
> [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
>
> > benchmark(fun1(), fun2(), fun3(), fun4(), order = "relative",
> replications = 1e6L)
>     test replications elapsed relative user.self sys.self user.child
> sys.child
> 3 fun3()      1000000    7.15    1.000      7.01     0.00
> NA        NA
> 4 fun4()      1000000    7.33    1.025      7.27     0.01
> NA        NA
> 2 fun2()      1000000    7.53    1.053      7.41     0.00
> NA        NA
> 1 fun1()      1000000    7.99    1.117      7.91     0.00
> NA        NA
>
> ****
>
> On Tue, Sep 25, 2012 at 8:19 PM, Goldfeld, Keith <Keith.Goldfeld at nyumc.org>
> wrote:****
>
> Yes - I agree that iterating is not the best way to go, but I am using
> this for an agent based simulation where iteration is really my only
> option.  I won't be generating more than a few thousand records, so the
> time shouldn't be much of a factor.  Of course, in R it is painfully slow -
> so that is why I am going the Rcpp route.****
>
>
> -----Original Message-----
> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas
> Bates
> Sent: Tuesday, September 25, 2012 3:13 PM
> To: Rodney Sparapani
> Cc: Goldfeld, Keith; rcpp-devel at lists.r-forge.r-project.org
> Subject: Re: [Rcpp-devel] NumericVector Double mismatch when indexing an
> array****
>
> On Tue, Sep 25, 2012 at 1:53 PM, Rodney Sparapani <rsparapa at mcw.edu>
> wrote:
> > On 09/25/2012 01:37 PM, Goldfeld, Keith wrote:
> >
> >>>  code<- 'Rcpp::RNGScope scope;
> >>
> >>
> >> +                    Rcpp::NumericVector rn(5);
> >>
> >> +                    for (int i=0; i<  5; i++) {
> >>
> >> +                        rn(i) = rnorm(1,0,1);
> >>
> >> +                    }
> >>
> >> +                    return Rcpp::wrap(rn);
> >>
> >> +  '
> >>
> >>>
> >>
> >>>  fun<- cxxfunction(body=code, plugin="Rcpp")
> >
> >
> > You just need an explicit type conversion like so:
> >
> > funk2 <- cxxfunction(body='RNGScope scope;
> >           NumericVector rn(5);
> >           for (int i=0; i < 5; i++) rn[i] = as<double>(rnorm(1,0,1));
> >           return wrap(rn);',
> >           includes="using namespace Rcpp;\n", plugin="Rcpp")
> >
> > funk2()
>
> That works but you wouldn't want to use it as a general model because you
> are creating vectors of length 1 then dereferencing the first element in
> each of these vectors and copying it to another vector.
> Obviously with n=5 the difference in execution time will be minimal but
> with n=5 million it won't be.  Using the Rcpp sugar function rnorm will be
> the easiest approach unless, like me, you get a little queasy when using
> Rcpp sugar functions just because it is so hard to pick them apart and see
> what is actually happening.
>
> I would probably end up going back to the R API and calling norm_rand or
> Rf_rnorm.  That latter returns a double from two double arguments, mu and
> sigma.
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel***
> *
>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel***
> *
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120925/35dc2c59/attachment-0001.html>


More information about the Rcpp-devel mailing list