[Rcpp-devel] Optimising 2d convolution

Sat Jan 7 17:40:10 CET 2012

2012/1/7 Romain François <romain at r-enthusiasts.com>:
> Hi,
>
> Using some of what we know about the structure of an R matrix, we can use
> this version:
>
> convolve_2d_tricks <- cxxfunction(signature(sampleS = "numeric", kernelS =
>
> "numeric"), plugin = "Rcpp", '
>    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
>    int x_s = sample.nrow(), x_k = kernel.nrow();
>    int y_s = sample.ncol(), y_k = kernel.ncol();
>
>    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
>    double* output_row_col_j ;
>    double sample_row_col = 0.0 ;
>    double* kernel_j ;
>
>
>
>    for (int row = 0; row < x_s; row++) {
>      for (int col = 0; col < y_s; col++) {
>        sample_row_col = sample(row,col) ;
>
>
>        for (int j = 0; j < y_k; j++) {
>          output_row_col_j = & output( row, col+j ) ;
>          kernel_j = &kernel(0,j) ;
>
>          for (int i = 0; i < x_k; i++) {
>            output_row_col_j[i] += sample_row_col * kernel_j[i] ;
>          }
>        }
>      }
>    }
>    return output;
> ')
>
> I get quite a speed up on hadley's example:
>
> utilisateur     système      écoulé
>      1.555       0.003       1.558
> utilisateur     système      écoulé
>      0.547       0.004       0.550
> [1] TRUE

I don't get a 3x speed increase on my system.  Here the Eigen version is faster.

> The idea is that we compute pointers and constants just once, asap, so that
> we don't have to do it again in inner loops.
>
> Romain
>
> Le 06/01/12 19:43, Hadley Wickham a écrit :
>
>> Hi all,
>>
>> The rcpp-devel list has been very helpful to me so far, so I hope you
>> don't mind another question!  I'm trying to speed up my 2d convolution
>> function:
>>
>>
>>
>> library(inline)
>> # 2d convolution
>> -------------------------------------------------------------
>>
>> convolve_2d<- cxxfunction(signature(sampleS = "numeric", kernelS =
>> "numeric"), plugin = "Rcpp", '
>>     Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
>>     int x_s = sample.nrow(), x_k = kernel.nrow();
>>     int y_s = sample.ncol(), y_k = kernel.ncol();
>>
>>     Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
>>     for (int row = 0; row<  x_s; row++) {
>>       for (int col = 0; col<  y_s; col++) {
>>         for (int i = 0; i<  x_k; i++) {
>>           for (int j = 0; j<  y_k; j++) {
>>             output(row + i, col + j) += sample(row, col) * kernel(i, j);
>>           }
>>         }
>>       }
>>     }
>>     return output;
>> ')
>>
>>
>> x<- diag(1000)
>> k<- matrix(runif(20* 20), ncol = 20)
>> system.time(convolve_2d(x, k))
>> #    user  system elapsed
>> # 14.759   0.028  15.524
>>
>> I have a vague idea that to get better performance I need to get
>> closer to bare pointers, and I need to be careful about the order of
>> my loops to ensure that I'm working over contiguous chunks of memory
>> as often as possible, but otherwise I've basically exhausted my
>> C++/Rcpp knowledge.  Where should I start looking to improve the
>> performance of this function?
>>
>> The example data basically matches the real problem - x is not usually
>> going to be much bigger than 1000 x 1000 and k typically will be much
>> smaller.  (And hence, based on what I've read, this technique should
>> be faster than doing it via a discrete fft)
>>
>> Hadley
>>
>
>
> --
> Romain Francois
> Professional R Enthusiast
> http://romainfrancois.blog.free.fr
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part --------------

R version 2.14.1 (2011-12-22)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(inline)
> # 2d convolution -------------------------------------------------------------
> 
> convolve_2d <- cxxfunction(signature(sampleS = "numeric",
+                                      kernelS = "numeric"),
+                            plugin = "Rcpp", '
+    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
+    int x_s = sample.nrow(), x_k = kernel.nrow();
+    int y_s = sample.ncol(), y_k = kernel.ncol();
+ 
+    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
+    for (int row = 0; row < x_s; row++) {
+      for (int col = 0; col < y_s; col++) {
+        for (int i = 0; i < x_k; i++) {
+          for (int j = 0; j < y_k; j++) {
+            output(row + i, col + j) += sample(row, col) * kernel(i, j);
+          }
+        }
+      }
+    }
+    return output;
+ ')
> 
> convolve_2d_Eigen <- cxxfunction(signature(sampleS = "numeric",
+                                            kernelS = "numeric"),
+                                  plugin = "RcppEigen", '
+    typedef Eigen::Map<Eigen::MatrixXd>  MMat;
+ 
+    const MMat sample(as<MMat>(sampleS)), kernel(as<MMat>(kernelS));
+    int x_s = sample.rows(), x_k = kernel.rows();
+    int y_s = sample.cols(), y_k = kernel.cols();
+    int x_o = x_s + x_k - 1, y_o = y_s + y_k - 1;
+ 
+    Rcpp::NumericMatrix out(x_o, y_o);
+    MMat  output(out.begin(), x_o, y_o);
+ 
+    for (int row = 0; row < x_s; row++) {
+      for (int col = 0; col < y_s; col++) {
+        output.block(row, col, x_k, y_k) += sample(row, col) * kernel;
+      }
+    }
+    return out;
+ ')
Loading required package: lattice

Attaching package: ?Matrix?

The following object(s) are masked from ?package:base?:

    det

> 
> convolve_2d_tricks <- cxxfunction(signature(sampleS = "numeric",
+                                             kernelS = "numeric"), plugin = "Rcpp", '
+    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
+    int x_s = sample.nrow(), x_k = kernel.nrow();
+    int y_s = sample.ncol(), y_k = kernel.ncol();
+ 
+    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
+    double* output_row_col_j ;
+    double sample_row_col = 0.0 ;
+    double* kernel_j ;
+ 
+ 
+ 
+    for (int row = 0; row < x_s; row++) {
+      for (int col = 0; col < y_s; col++) {
+        sample_row_col = sample(row,col) ;
+ 
+ 
+        for (int j = 0; j < y_k; j++) {
+          output_row_col_j = & output( row, col+j ) ;
+          kernel_j = &kernel(0,j) ;
+ 
+          for (int i = 0; i < x_k; i++) {
+            output_row_col_j[i] += sample_row_col * kernel_j[i] ;
+          }
+        }
+      }
+    }
+    return output;
+ ')
> 
> x <- diag(1000)
> set.seed(1)
> k <- matrix(runif(20* 20), ncol = 20)
> system.time(loop <- convolve_2d(x, k))
   user  system elapsed 
  0.836   0.004   0.843 
> system.time(block <- convolve_2d_Eigen(x, k))
   user  system elapsed 
  0.576   0.004   0.580 
> system.time(tricks <- convolve_2d_tricks(x, k))
   user  system elapsed 
  0.808   0.000   0.810 
> all.equal(loop, block)
[1] TRUE
> all.equal(loop, tricks)
[1] TRUE
> 
> library(microbenchmark)
> microbenchmark(loop  = convolve_2d(x, k),
+                block = convolve_2d_Eigen(x, k),
+                tricks = convolve_2d_tricks(x, k),
+                times=30)
Unit: milliseconds
    expr      min       lq   median       uq      max
1  block 474.9779 479.2366 480.6027 483.3144 499.3893
2   loop 841.9106 842.9960 846.0636 854.6946 957.7293
3 tricks 700.1348 706.0600 710.3445 712.6358 717.6215
> 
> proc.time()
   user  system elapsed 
 77.488   1.556  78.727