[Rcpp-devel] Optimising 2d convolution

Fri Jan 6 23:48:34 CET 2012

On 7 January 2012 11:40, baptiste auguie <baptiste.auguie at googlemail.com> wrote:
> Hi,
>
> (Rcpp)Armadillo has a 1D convolution function <
> http://arma.sourceforge.net/docs.html#conv >. It might be a very
> inefficient way, to use it sequentially for 2D; I haven't tried.

Also, it only works for separable kernels.

baptiste

>
> HTH,
>
> baptiste
>
>
>
> On 7 January 2012 07:43, Hadley Wickham <hadley at rice.edu> wrote:
>> Hi all,
>>
>> The rcpp-devel list has been very helpful to me so far, so I hope you
>> don't mind another question!  I'm trying to speed up my 2d convolution
>> function:
>>
>>
>>
>> library(inline)
>> # 2d convolution -------------------------------------------------------------
>>
>> convolve_2d <- cxxfunction(signature(sampleS = "numeric", kernelS =
>> "numeric"), plugin = "Rcpp", '
>>    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
>>    int x_s = sample.nrow(), x_k = kernel.nrow();
>>    int y_s = sample.ncol(), y_k = kernel.ncol();
>>
>>    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
>>    for (int row = 0; row < x_s; row++) {
>>      for (int col = 0; col < y_s; col++) {
>>        for (int i = 0; i < x_k; i++) {
>>          for (int j = 0; j < y_k; j++) {
>>            output(row + i, col + j) += sample(row, col) * kernel(i, j);
>>          }
>>        }
>>      }
>>    }
>>    return output;
>> ')
>>
>>
>> x <- diag(1000)
>> k <- matrix(runif(20* 20), ncol = 20)
>> system.time(convolve_2d(x, k))
>> #    user  system elapsed
>> # 14.759   0.028  15.524
>>
>> I have a vague idea that to get better performance I need to get
>> closer to bare pointers, and I need to be careful about the order of
>> my loops to ensure that I'm working over contiguous chunks of memory
>> as often as possible, but otherwise I've basically exhausted my
>> C++/Rcpp knowledge.  Where should I start looking to improve the
>> performance of this function?
>>
>> The example data basically matches the real problem - x is not usually
>> going to be much bigger than 1000 x 1000 and k typically will be much
>> smaller.  (And hence, based on what I've read, this technique should
>> be faster than doing it via a discrete fft)
>>
>> Hadley
>>
>> --
>> Assistant Professor / Dobelman Family Junior Chair
>> Department of Statistics / Rice University
>> http://had.co.nz/
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel