[Rcpp-devel] Optimising 2d convolution
Douglas Bates
bates at stat.wisc.edu
Sat Jan 7 17:40:10 CET 2012
2012/1/7 Romain François <romain at r-enthusiasts.com>:
> Hi,
>
> Using some of what we know about the structure of an R matrix, we can use
> this version:
>
> convolve_2d_tricks <- cxxfunction(signature(sampleS = "numeric", kernelS =
>
> "numeric"), plugin = "Rcpp", '
> Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
> int x_s = sample.nrow(), x_k = kernel.nrow();
> int y_s = sample.ncol(), y_k = kernel.ncol();
>
> Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
> double* output_row_col_j ;
> double sample_row_col = 0.0 ;
> double* kernel_j ;
>
>
>
> for (int row = 0; row < x_s; row++) {
> for (int col = 0; col < y_s; col++) {
> sample_row_col = sample(row,col) ;
>
>
> for (int j = 0; j < y_k; j++) {
> output_row_col_j = & output( row, col+j ) ;
> kernel_j = &kernel(0,j) ;
>
> for (int i = 0; i < x_k; i++) {
> output_row_col_j[i] += sample_row_col * kernel_j[i] ;
> }
> }
> }
> }
> return output;
> ')
>
> I get quite a speed up on hadley's example:
>
> utilisateur système écoulé
> 1.555 0.003 1.558
> utilisateur système écoulé
> 0.547 0.004 0.550
> [1] TRUE
I don't get a 3x speed increase on my system. Here the Eigen version is faster.
> The idea is that we compute pointers and constants just once, asap, so that
> we don't have to do it again in inner loops.
>
> Romain
>
> Le 06/01/12 19:43, Hadley Wickham a écrit :
>
>> Hi all,
>>
>> The rcpp-devel list has been very helpful to me so far, so I hope you
>> don't mind another question! I'm trying to speed up my 2d convolution
>> function:
>>
>>
>>
>> library(inline)
>> # 2d convolution
>> -------------------------------------------------------------
>>
>> convolve_2d<- cxxfunction(signature(sampleS = "numeric", kernelS =
>> "numeric"), plugin = "Rcpp", '
>> Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
>> int x_s = sample.nrow(), x_k = kernel.nrow();
>> int y_s = sample.ncol(), y_k = kernel.ncol();
>>
>> Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
>> for (int row = 0; row< x_s; row++) {
>> for (int col = 0; col< y_s; col++) {
>> for (int i = 0; i< x_k; i++) {
>> for (int j = 0; j< y_k; j++) {
>> output(row + i, col + j) += sample(row, col) * kernel(i, j);
>> }
>> }
>> }
>> }
>> return output;
>> ')
>>
>>
>> x<- diag(1000)
>> k<- matrix(runif(20* 20), ncol = 20)
>> system.time(convolve_2d(x, k))
>> # user system elapsed
>> # 14.759 0.028 15.524
>>
>> I have a vague idea that to get better performance I need to get
>> closer to bare pointers, and I need to be careful about the order of
>> my loops to ensure that I'm working over contiguous chunks of memory
>> as often as possible, but otherwise I've basically exhausted my
>> C++/Rcpp knowledge. Where should I start looking to improve the
>> performance of this function?
>>
>> The example data basically matches the real problem - x is not usually
>> going to be much bigger than 1000 x 1000 and k typically will be much
>> smaller. (And hence, based on what I've read, this technique should
>> be faster than doing it via a discrete fft)
>>
>> Hadley
>>
>
>
> --
> Romain Francois
> Professional R Enthusiast
> http://romainfrancois.blog.free.fr
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part --------------
R version 2.14.1 (2011-12-22)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(inline)
> # 2d convolution -------------------------------------------------------------
>
> convolve_2d <- cxxfunction(signature(sampleS = "numeric",
+ kernelS = "numeric"),
+ plugin = "Rcpp", '
+ Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
+ int x_s = sample.nrow(), x_k = kernel.nrow();
+ int y_s = sample.ncol(), y_k = kernel.ncol();
+
+ Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
+ for (int row = 0; row < x_s; row++) {
+ for (int col = 0; col < y_s; col++) {
+ for (int i = 0; i < x_k; i++) {
+ for (int j = 0; j < y_k; j++) {
+ output(row + i, col + j) += sample(row, col) * kernel(i, j);
+ }
+ }
+ }
+ }
+ return output;
+ ')
>
> convolve_2d_Eigen <- cxxfunction(signature(sampleS = "numeric",
+ kernelS = "numeric"),
+ plugin = "RcppEigen", '
+ typedef Eigen::Map<Eigen::MatrixXd> MMat;
+
+ const MMat sample(as<MMat>(sampleS)), kernel(as<MMat>(kernelS));
+ int x_s = sample.rows(), x_k = kernel.rows();
+ int y_s = sample.cols(), y_k = kernel.cols();
+ int x_o = x_s + x_k - 1, y_o = y_s + y_k - 1;
+
+ Rcpp::NumericMatrix out(x_o, y_o);
+ MMat output(out.begin(), x_o, y_o);
+
+ for (int row = 0; row < x_s; row++) {
+ for (int col = 0; col < y_s; col++) {
+ output.block(row, col, x_k, y_k) += sample(row, col) * kernel;
+ }
+ }
+ return out;
+ ')
Loading required package: lattice
Attaching package: ?Matrix?
The following object(s) are masked from ?package:base?:
det
>
> convolve_2d_tricks <- cxxfunction(signature(sampleS = "numeric",
+ kernelS = "numeric"), plugin = "Rcpp", '
+ Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
+ int x_s = sample.nrow(), x_k = kernel.nrow();
+ int y_s = sample.ncol(), y_k = kernel.ncol();
+
+ Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
+ double* output_row_col_j ;
+ double sample_row_col = 0.0 ;
+ double* kernel_j ;
+
+
+
+ for (int row = 0; row < x_s; row++) {
+ for (int col = 0; col < y_s; col++) {
+ sample_row_col = sample(row,col) ;
+
+
+ for (int j = 0; j < y_k; j++) {
+ output_row_col_j = & output( row, col+j ) ;
+ kernel_j = &kernel(0,j) ;
+
+ for (int i = 0; i < x_k; i++) {
+ output_row_col_j[i] += sample_row_col * kernel_j[i] ;
+ }
+ }
+ }
+ }
+ return output;
+ ')
>
> x <- diag(1000)
> set.seed(1)
> k <- matrix(runif(20* 20), ncol = 20)
> system.time(loop <- convolve_2d(x, k))
user system elapsed
0.836 0.004 0.843
> system.time(block <- convolve_2d_Eigen(x, k))
user system elapsed
0.576 0.004 0.580
> system.time(tricks <- convolve_2d_tricks(x, k))
user system elapsed
0.808 0.000 0.810
> all.equal(loop, block)
[1] TRUE
> all.equal(loop, tricks)
[1] TRUE
>
> library(microbenchmark)
> microbenchmark(loop = convolve_2d(x, k),
+ block = convolve_2d_Eigen(x, k),
+ tricks = convolve_2d_tricks(x, k),
+ times=30)
Unit: milliseconds
expr min lq median uq max
1 block 474.9779 479.2366 480.6027 483.3144 499.3893
2 loop 841.9106 842.9960 846.0636 854.6946 957.7293
3 tricks 700.1348 706.0600 710.3445 712.6358 717.6215
>
> proc.time()
user system elapsed
77.488 1.556 78.727
More information about the Rcpp-devel
mailing list