[Rcpp-devel] NumericVector Double mismatch when indexing an array

Goldfeld, Keith Keith.Goldfeld at nyumc.org
Tue Sep 25 22:33:59 CEST 2012


There does appear to be a difference between the native Rf_rnorm and the R rnorm function when the sample sizes get large.  In fact, the Rf_rnorm appears to be about twice as fast (using benchmark).  Not surprisingly, doing a loop in R is about 50 times slower.

From: Jonathan Olmsted [mailto:jpolmsted at gmail.com]
Sent: Tuesday, September 25, 2012 4:29 PM
To: Jeffrey Pollock
Cc: Goldfeld, Keith; rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] NumericVector Double mismatch when indexing an array

Jeff and List,

I was interested in the same fairly recently. The full shebang is here <http://www.rochester.edu/college/gradstudents/jolmsted/blog/2012/08/22/rcpp-rng-performance/>. I, unlike you, found some difference between the various approaches.

If I had to guess why we come up with different results, I'd say that the time for any one call for any of your inline functions is 90% overhead and the differences between the various approaches, while present, is on a vastly different (and smaller) scale than the overhead itself due to only 5 draws being taken. Of course, how any of these timings inform one's approach depends on the specifics of how they'll set up their analysis.

For what it is worth, instead of an explicit "as<>()" conversion, I usually pull out the first element of the returned NumericVector. That is,
rn[i] = as<double>(rnorm(1, 0.0, 1.0));
becomes
rn[i] = rnorm(1, 0.0, 1.0)[0];
I haven't ever timed this, though.

JPO



-------------------------------------------------------------------------
J. P. Olmsted
j.p.olmsted at gmail.com<mailto:j.p.olmsted at gmail.com>
http://about.me/olmjo
-------------------------------------------------------------------------




On Tue, Sep 25, 2012 at 4:08 PM, Jeffrey Pollock <jeffpollock9 at gmail.com<mailto:jeffpollock9 at gmail.com>> wrote:
I was interested to see if there was any real speed difference between the different methods suggested, and it looks like... there isn't...

library(inline)
library(Rcpp)
library(rbenchmark)

fun1 <- cxxfunction(body = '
                RNGScope scope;
                NumericVector rn(5);
                for (int i = 0; i < 5; i++) {
                    rn[i] = as<double>(rnorm(1, 0.0, 1.0));
                }
                return rn;
                ', plugin = "Rcpp")

fun2 <- cxxfunction(body = '
                RNGScope scope;
                NumericVector rn(5);
                for (int i = 0; i < 5; i++) {
                    rn[i] = Rf_rnorm(0.0, 1.0);
                }
                return rn;
                ', plugin = "Rcpp")

fun3 <- cxxfunction(body = '
                RNGScope scope;
                NumericVector rn(5);
                for (int i = 0; i < 5; i++) {
                    rn[i] = norm_rand();
                }
                return rn;
                ', plugin = "Rcpp")

fun4 <- cxxfunction(body = '
                RNGScope scope;
                return rnorm(5, 0.0, 1.0);
                ', plugin = "Rcpp")

set.seed(1)
print(fun1())

set.seed(1)
print(fun2())

set.seed(1)
print(fun3())

set.seed(1)
print(fun4())

benchmark(fun1(), fun2(), fun3(), fun4(), order = "relative", replications = 1e6L)

output;

> set.seed(1)
> print(fun1())
[1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

> set.seed(1)
> print(fun2())
[1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

> set.seed(1)
> print(fun3())
[1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

> set.seed(1)
> print(fun4())
[1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

> benchmark(fun1(), fun2(), fun3(), fun4(), order = "relative", replications = 1e6L)
    test replications elapsed relative user.self sys.self user.child sys.child
3 fun3()      1000000    7.15    1.000      7.01     0.00         NA        NA
4 fun4()      1000000    7.33    1.025      7.27     0.01         NA        NA
2 fun2()      1000000    7.53    1.053      7.41     0.00         NA        NA
1 fun1()      1000000    7.99    1.117      7.91     0.00         NA        NA

On Tue, Sep 25, 2012 at 8:19 PM, Goldfeld, Keith <Keith.Goldfeld at nyumc.org<mailto:Keith.Goldfeld at nyumc.org>> wrote:
Yes - I agree that iterating is not the best way to go, but I am using this for an agent based simulation where iteration is really my only option.  I won't be generating more than a few thousand records, so the time shouldn't be much of a factor.  Of course, in R it is painfully slow - so that is why I am going the Rcpp route.

-----Original Message-----
From: dmbates at gmail.com<mailto:dmbates at gmail.com> [mailto:dmbates at gmail.com<mailto:dmbates at gmail.com>] On Behalf Of Douglas Bates
Sent: Tuesday, September 25, 2012 3:13 PM
To: Rodney Sparapani
Cc: Goldfeld, Keith; rcpp-devel at lists.r-forge.r-project.org<mailto:rcpp-devel at lists.r-forge.r-project.org>
Subject: Re: [Rcpp-devel] NumericVector Double mismatch when indexing an array
On Tue, Sep 25, 2012 at 1:53 PM, Rodney Sparapani <rsparapa at mcw.edu<mailto:rsparapa at mcw.edu>> wrote:
> On 09/25/2012 01:37 PM, Goldfeld, Keith wrote:
>
>>>  code<- 'Rcpp::RNGScope scope;
>>
>>
>> +                    Rcpp::NumericVector rn(5);
>>
>> +                    for (int i=0; i<  5; i++) {
>>
>> +                        rn(i) = rnorm(1,0,1);
>>
>> +                    }
>>
>> +                    return Rcpp::wrap(rn);
>>
>> +  '
>>
>>>
>>
>>>  fun<- cxxfunction(body=code, plugin="Rcpp")
>
>
> You just need an explicit type conversion like so:
>
> funk2 <- cxxfunction(body='RNGScope scope;
>           NumericVector rn(5);
>           for (int i=0; i < 5; i++) rn[i] = as<double>(rnorm(1,0,1));
>           return wrap(rn);',
>           includes="using namespace Rcpp;\n", plugin="Rcpp")
>
> funk2()

That works but you wouldn't want to use it as a general model because you are creating vectors of length 1 then dereferencing the first element in each of these vectors and copying it to another vector.
Obviously with n=5 the difference in execution time will be minimal but with n=5 million it won't be.  Using the Rcpp sugar function rnorm will be the easiest approach unless, like me, you get a little queasy when using Rcpp sugar functions just because it is so hard to pick them apart and see what is actually happening.

I would probably end up going back to the R API and calling norm_rand or Rf_rnorm.  That latter returns a double from two double arguments, mu and sigma.
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org<mailto:Rcpp-devel at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org<mailto:Rcpp-devel at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120925/e87dc5b4/attachment.html>


More information about the Rcpp-devel mailing list