[Rcpp-devel] Optimising 2d convolution

Fri Jan 6 20:07:19 CET 2012

On Fri, Jan 6, 2012 at 12:43 PM, Hadley Wickham <hadley at rice.edu> wrote:
> Hi all,
>
> The rcpp-devel list has been very helpful to me so far, so I hope you
> don't mind another question!  I'm trying to speed up my 2d convolution
> function:
>
>
>
> library(inline)
> # 2d convolution -------------------------------------------------------------
>
> convolve_2d <- cxxfunction(signature(sampleS = "numeric", kernelS =
> "numeric"), plugin = "Rcpp", '
>    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
>    int x_s = sample.nrow(), x_k = kernel.nrow();
>    int y_s = sample.ncol(), y_k = kernel.ncol();
>
>    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
>    for (int row = 0; row < x_s; row++) {
>      for (int col = 0; col < y_s; col++) {
>        for (int i = 0; i < x_k; i++) {
>          for (int j = 0; j < y_k; j++) {
>            output(row + i, col + j) += sample(row, col) * kernel(i, j);
>          }
>        }
>      }
>    }
>    return output;
> ')
>
>
> x <- diag(1000)
> k <- matrix(runif(20* 20), ncol = 20)
> system.time(convolve_2d(x, k))
> #    user  system elapsed
> # 14.759   0.028  15.524
>
> I have a vague idea that to get better performance I need to get
> closer to bare pointers, and I need to be careful about the order of
> my loops to ensure that I'm working over contiguous chunks of memory
> as often as possible, but otherwise I've basically exhausted my
> C++/Rcpp knowledge.  Where should I start looking to improve the
> performance of this function?
>
> The example data basically matches the real problem - x is not usually
> going to be much bigger than 1000 x 1000 and k typically will be much
> smaller.  (And hence, based on what I've read, this technique should
> be faster than doing it via a discrete fft)

What are you doing the timing on?  On a modest desktop (2.6 GHz Athlon
X4) I get less than a second for this

> library(inline)
> convolve_2d <- cxxfunction(signature(sampleS = "numeric", kernelS =
+ "numeric"), plugin = "Rcpp", '
+    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
+    int x_s = sample.nrow(), x_k = kernel.nrow();
+    int y_s = sample.ncol(), y_k = kernel.ncol();
+    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
+    for (int row = 0; row < x_s; row++) {
+      for (int col = 0; col < y_s; col++) {
+        for (int i = 0; i < x_k; i++) {
+          for (int j = 0; j < y_k; j++) {
+            output(row + i, col + j) += sample(row, col) * kernel(i, j);
+          }
+        }
+      }
+    }
+    return output;
+ ')
> x <- diag(1000)
> k <- matrix(runif(20* 20), ncol = 20)
> system.time(convolve_2d(x, k))
   user  system elapsed
  0.864   0.000   0.862