[Rcpp-devel] Apply Defined Rcpp Functions In Parallel
Dillon Hammill
Dillon.Hammill at anu.edu.au
Tue Aug 17 10:58:48 CEST 2021
Hi everyone,
Apologies if this is the wrong place to ask for help with Rcpp/RcppParallel.
I have recently began using Rcpp and RcppParallel to speed up an algorithm that I have written in R ( I still have a lot to learn!). I have had no issues converting my R code into Rcpp code, but I am struggling to get things working in parallel using RcppParallel. I was hoping that someone may be able help improve my understanding of how to translate Rcpp code into code that works with RcppParallel.
To make things easier to understand, I have created a simplified example that will hopefully illustrate what I hope to achieve.
Firstly, we compute a distance matrix based on xy co-ordinates using the dist() function in R:
x <- matrix(1:100 + 0.1, ncol = 2, dimnames = list(NULL, c("x","y")))
x_dist <- dist(x, method = "euclidean")
Secondly, we apply a custom algorithm that uses the raw data and distance matrix to compute labels for each point:
algo <- function(x,
dist = NULL) {
# algorithm code to compute a label for each row in x
# code is complex and requires both x and distance matrix
# but simply assigns an integer to each row
labels <- 1:nrow(x)
return(labels)
}
x_labels <- algo(x, dist = x_dist)
Now let's say we wanted run the algorithm for all distance methods and store the labels (single thread R code) in a matrix. This is the code I would like to run in parallel as it can take some time to compute the labels using the alogrithm.
dist_methods <- c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski")
x_labels <- lapply(dist_methods, function(dist_method) {
x_dist <- dist(x, method = dist_method)
return(algo(x, x_dist))
})
x_labels <- do.call("cbind", x_labels)
To speed up the algorithm we can convert these R functions into Rcpp equivalents. I have removed code to make it clearer and simply returned an example matrix/vector. I have also converted the dist_methods to an integer for simplicity (i.e. 1 - euclidean, 2 - maximum etc.).
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
NumericMatrix rcpp_dist(NumericMatrix x, int dist_method = 1) {
// code to compute distance matrix depends on dist_method
NumericMatrix x_dist(x.nrow(), x.nrow());
return x_dist;
}
//[[Rcpp::export]]
IntegerVector rcpp_algo(NumericMatrix x, NumericMatrix dist) {
// code to compute a label for each row in x using distance matrix
IntegerVector labels(x.nrow());
return labels;
}
Finally, we attempt to run the code on multiple threads for each distance method.
#include <RcppParallel.h>
using namespace RcppParallel;
struct RcppParAlgo : public Worker {
// source thread safe
RMatrix<double> x;
RVector<int> dist_method;
// output thread safe
RMatrix <int> labels;
// initialise
RcppParAlgo(NumericMatrix x, int dist_method)
: x(x), dist_method(dist_method), labels(labels) {}
// compute distance matrix based on method and run algo to get labels
void operator() (std::size_t begin, std::size_t end) {
// call rcpp_dist on x with dist_method
// I know this code is wrong but hopefully shows what I am trying to do (this is where I need some help)
for(std::size_t i = begin; i < end; i++) {
RVector<int> dist_type = dist_method[i];
RMatrix<double> x_dist = rcpp_dist(x, dist_type);
RVector<int> x_labels = rcpp_algo(x, x_dist);
// push lables into output matrix
for(int j = 0; j <= x_labels.size(); j++) {
labels(j, i) = x_labels[j];
}
}
}
};
//[[Rcpp::export]]
IntegerMatrix rcppParAlgo(NumericMatrix x) {
// distance methods
IntegerVector dist_methods = {1,2,3,4,5};
// allocate output matrix to store labels
IntegerMatrix labels(x.nrow(), dist_methods.size());
// functor
RcppParAlgo RcppParAlgo(x, dist_methods);
// run in parallel
parallelFor(0, dist_methods.length(), RcppParAlgo);
return labels;
}
The main questions I would like to address are:
1. As the code is currently written, rcpp_dist and rcpp_algo do not accept RMatrix/RVector classes which causes problems when I try call these functions in parallel. Is there a way to write generic functions that will work in both cases? I still want to export the definitions of rcpp_dist() and rcpp_algo() so they can be called from R.
2. Is it safe to create new objects on each thread? For example, in my case I need to create the temporary distance matrix which is passed to rcpp_algo() along with x to compute the labels. I haven't seen any examples of this, are there constructors for RMatrix/RVector similar to NumericMatrix/NumericVector?
Any help or suggestions would be greatly appreciated. I have placed the Rcpp/RcppParallel code in package hosted on GitHub if you would like to fiddle with the code:
https://github.com/DillonHammill/RcppHelp
[https://opengraph.githubassets.com/c783348bc98fe97d064ef1da1b860542a1422388bfb389ba366bc8827df916cf/DillonHammill/RcppHelp]<https://github.com/DillonHammill/RcppHelp>
GitHub - DillonHammill/RcppHelp: Example RcppParallel Code<https://github.com/DillonHammill/RcppHelp>
Example RcppParallel Code. Contribute to DillonHammill/RcppHelp development by creating an account on GitHub.
github.com
Cheers,
Dillon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20210817/918f893b/attachment-0001.html>
More information about the Rcpp-devel
mailing list