[Rcpp-devel] Why inline function is much faster than .Call?

Tue Aug 28 18:16:30 CEST 2012

I tried to reproduce your results, but I cannot :

> xx<-matrix(1:60000, 2)
> benchmark(test(xx), test_inline(xx), .Call('test', xx))
               test replications elapsed relative user.self sys.self user.child
3 .Call("test", xx)          100   0.996 1.000000     0.991    0.006          0
1          test(xx)          100   1.045 1.049197     1.039    0.005          0
2   test_inline(xx)          100   1.090 1.094378     1.086    0.005          0

I am using the same code as you, but I put test.cpp into a package and loaded it. I also increased the matrix size since using a 3x2 matrix would time almost nothing but R function call, and am using 'benchmark', rather than 'microbenchmark'.

These results are what I would have expected: all three methods are within a few percent of each other. The direct .Call is the fastest, then the package, and finally the 'inline'd function.

You'll see your compilation details if you pass "verbose=TRUE" in the call to "inline". On my machine, at least, this simply runs "R CMD SHLIB", so the compiler flags should be exactly the same as the ones used to build the package.

I'm using OS X 10.7.4 and Apple's GCC-LLVM that came with Xcode 4.4.

Davor

On 2012-08-27, at 2:08 PM, Peng Yu wrote:

> On Mon, Aug 27, 2012 at 3:44 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>> 
>> On 27 August 2012 at 15:17, Peng Yu wrote:
>> | Hi Dirk,
>> |
>> | > inlines uses .Call, so there is a slight logical problem here...  And yes,
>> |
>> | What is the logical problem?
>> 
>> What inline's cxxfunction() gives you is an object which uses .Call.  So it
>> is a little hard to imagine how that could be faster than using .Call as it
> 
> I agree with you that it is hard to imagine.
> 
>> also uses .Call.  More likely, it may reflect random fluctuations in your
>> setup as also suggested by Doug.
> 
> Why don't you try it on your system and see what number you get? I
> have tried on two systems Mac and linux, if it were due to
> flucturation, one can not be consistently faster than the other.
> 
> microbench run the function 100 times by default. it does not seem
> that Doug's comment makes sense.
> 
> In the following, I try to run 1000 times with both microbenchmark as
> well as benchmark. "benchmark" shows that calling the wrapper function
> is a little slower, but "microbenchmark" still shows the opposite.
> Given our belief that the wrapper should be slower, I tend to think
> that there might be some bug in the way microbenchmark record the run
> time. But I'm not completely sure.
> 
> Both evaluation methods still show inline is the fastest (this is true
> on linux, except the difference is only 2 times not almost 10 times).
> 
> 
> # on mac
>> library(microbenchmark)
>> microbenchmark(test(xx), test_inline(xx), .Call('test', xx), times=1000)
> Unit: microseconds
>               expr    min      lq  median      uq     max
> 1 .Call("test", xx) 94.515 95.6160 96.3790 97.7700 135.709
> 2   test_inline(xx)  4.451  5.1655  6.4150  6.6380  29.944
> 3          test(xx) 69.004 69.8880 70.5375 71.4975 199.150
>> library(rbenchmark)
>> benchmark(test(xx), test_inline(xx), .Call('test', xx), replications=1000)
>               test replications elapsed relative user.self sys.self
> user.child sys.child
> 3 .Call("test", xx)         1000   0.076 8.444444     0.044    0.032
>       0         0
> 2   test_inline(xx)         1000   0.009 1.000000     0.009    0.000
>       0         0
> 1          test(xx)         1000   0.078 8.666667     0.046    0.032
>       0         0
> 
> 
> -- 
> Regards,
> Peng
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel