[Rcpp-devel] Optimising 2d convolution
Steve Lianoglou
mailinglist.honeypot at gmail.com
Fri Jan 6 20:08:00 CET 2012
Hi,
Fancy indexing tricks aside, another avenue you might want to look
into is doing the convolution in fourier space ...
I'll stop there so I minimize the damage I'm doing as far as spreading
misinformation in a public forum goes. I'm also interested in
investigating that avenue further, as I wrote a 1d convolution for
something else in Rcpp and want to drive the "fourier trick" via C
code ... I just haven't had the time to do the follow up reading.
Anyway, I "know" it works in the 1d case (that's how the base R
convolve works), I guess there are particulars to sort out in the 2d
case ...
And if you *really* want it to go fast, I guess you can go the CUDA route :-)
http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/convolutionFFT2D/doc/convolutionFFT2D.pdf
HTH (but I'm sure it didn't :-)
-steve
On Fri, Jan 6, 2012 at 1:43 PM, Hadley Wickham <hadley at rice.edu> wrote:
> Hi all,
>
> The rcpp-devel list has been very helpful to me so far, so I hope you
> don't mind another question! I'm trying to speed up my 2d convolution
> function:
>
>
>
> library(inline)
> # 2d convolution -------------------------------------------------------------
>
> convolve_2d <- cxxfunction(signature(sampleS = "numeric", kernelS =
> "numeric"), plugin = "Rcpp", '
> Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
> int x_s = sample.nrow(), x_k = kernel.nrow();
> int y_s = sample.ncol(), y_k = kernel.ncol();
>
> Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
> for (int row = 0; row < x_s; row++) {
> for (int col = 0; col < y_s; col++) {
> for (int i = 0; i < x_k; i++) {
> for (int j = 0; j < y_k; j++) {
> output(row + i, col + j) += sample(row, col) * kernel(i, j);
> }
> }
> }
> }
> return output;
> ')
>
>
> x <- diag(1000)
> k <- matrix(runif(20* 20), ncol = 20)
> system.time(convolve_2d(x, k))
> # user system elapsed
> # 14.759 0.028 15.524
>
> I have a vague idea that to get better performance I need to get
> closer to bare pointers, and I need to be careful about the order of
> my loops to ensure that I'm working over contiguous chunks of memory
> as often as possible, but otherwise I've basically exhausted my
> C++/Rcpp knowledge. Where should I start looking to improve the
> performance of this function?
>
> The example data basically matches the real problem - x is not usually
> going to be much bigger than 1000 x 1000 and k typically will be much
> smaller. (And hence, based on what I've read, this technique should
> be faster than doing it via a discrete fft)
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Rcpp-devel
mailing list