[Rcpp-devel] Optimising 2d convolution

Fri Jan 6 21:58:58 CET 2012

On Fri, Jan 6, 2012 at 2:02 PM, Hadley Wickham <hadley at rice.edu> wrote:
>> What are you doing the timing on?  On a modest desktop (2.6 GHz Athlon
>> X4) I get less than a second for this
> ...
>>> k <- matrix(runif(20* 20), ncol = 20)
>>> system.time(convolve_2d(x, k))
>>   user  system elapsed
>>  0.864   0.000   0.862
>
> Ooops, I must have been using a old k - I now get
>
>> system.time(convolve_2d(x, k))
>   user  system elapsed
>  2.265   0.006   2.298
>
> I'm on a 2nd gen macbook air, so the processor isn't the fastest (2.13
> GHz). Maybe my intuition is wrong but it seems like you should be able
> to do it faster than that.
>
> But let me think - I'm doing 1000 * 1000 * 20 * 20 = 4e08 multiplies +
> adds in ~2s.  So that's 5e-9 s = 5 ns per operation.  Looking at it
> that way it does seem pretty fast already.

You can make this about 40% faster by using block operations in Eigen
through RcppEigen.  In this code fragment I did remember to use a
typedef and a const modifier.  I allocated the result as an
Rcpp::NumericMatrix then mapped it to avoid an extra allocation and
copy but that is a minor speedup for matrices of this size.
-------------- next part --------------

R version 2.14.1 (2011-12-22)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(inline)
> # 2d convolution -------------------------------------------------------------
> 
> convolve_2d <- cxxfunction(signature(sampleS = "numeric", kernelS =
+ "numeric"), plugin = "Rcpp", '
+    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
+    int x_s = sample.nrow(), x_k = kernel.nrow();
+    int y_s = sample.ncol(), y_k = kernel.ncol();
+ 
+    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
+    for (int row = 0; row < x_s; row++) {
+      for (int col = 0; col < y_s; col++) {
+        for (int i = 0; i < x_k; i++) {
+          for (int j = 0; j < y_k; j++) {
+            output(row + i, col + j) += sample(row, col) * kernel(i, j);
+          }
+        }
+      }
+    }
+    return output;
+ ')
> 
> convolve_2d_Eigen <- cxxfunction(signature(sampleS = "numeric", kernelS =
+ "numeric"), plugin = "RcppEigen", '
+    typedef Eigen::Map<Eigen::MatrixXd>  MMat;
+ 
+    const MMat sample(as<MMat>(sampleS)), kernel(as<MMat>(kernelS));
+    int x_s = sample.rows(), x_k = kernel.rows();
+    int y_s = sample.cols(), y_k = kernel.cols();
+    int x_o = x_s + x_k - 1, y_o = y_s + y_k - 1;
+ 
+    Rcpp::NumericMatrix out(x_o, y_o);
+    MMat  output(out.begin(), x_o, y_o);
+    output.setZero();   // Must explicitly zero out Eigen matrices
+    for (int row = 0; row < x_s; row++) {
+      for (int col = 0; col < y_s; col++) {
+        output.block(row, col, x_k, y_k) += sample(row, col) * kernel;
+      }
+    }
+    return out;
+ ')
Loading required package: lattice

Attaching package: ?Matrix?

The following object(s) are masked from ?package:base?:

    det

> 
> x <- diag(1000)
> set.seed(1)
> k <- matrix(runif(20* 20), ncol = 20)
> system.time(loop <- convolve_2d(x, k))
   user  system elapsed 
   0.84    0.00    0.84 
> system.time(block <- convolve_2d_Eigen(x, k))
   user  system elapsed 
  0.588   0.000   0.587 
> all.equal(loop, block)
[1] TRUE
> 
> library(microbenchmark)
> microbenchmark(loop  = convolve_2d(x, k),
+                block = convolve_2d_Eigen(x, k), times=30)
Unit: milliseconds
   expr      min       lq   median       uq      max
1 block 479.5808 481.5312 482.7564 492.0526 495.4081
2  loop 839.4510 844.4272 846.7940 854.1737 859.0746
> 
> proc.time()
   user  system elapsed 
 52.242   1.180  53.027