[Rcpp-devel] parallel distance matrix calculation

JJ Allaire jj.allaire at gmail.com
Sat Jul 12 03:19:08 CEST 2014


Two things to consider:

(1) The parallelFor and parallelReduce functions don't require iterators --
they just take indexes which you can use for iterating over any range. In
the gallery examples they are used to offset NumericVector.begin() to get
the address of the slide of the vector or matrix to read but you could us
the index for anything (i.e. the rows to process).

(2) The premise of RcppParallel is that you are reading and writing
directly into C arrays in background threads (it's not safe to call into R
and therefore not really safe to call into Rcpp). So to interact with a
Matrix/Vector you need to calculate the appropriate offsets from
matrix.begin() to get the slices (rows/columns) of the matrix you want.

#2 is based on my conservative assumption about what's thread-safe in Rcpp.
Romain may tell us that it's perfectly safe to call vector iterators and
matrix::operator(,) from a background thread but I don't want to assume
that's okay without confirmation. If those things _aren't_ okay (or might
not be okay in the future) then we either need to provide good examples for
offsetting into vectors and matrixes or perhaps provide some lightweight
helper classes or functions for doing the same.

Romain, what do you think?

In terms of parallelizing the outer vs. inner computations, I think you'd
have to just benchmark. Fewer thread creations and fewer context switches
is better though so my gut would be that the outer loop is the one to
parallelize.

J.J.



On Fri, Jul 11, 2014 at 7:20 PM, James Bullard <scientificb at gmail.com>
wrote:

> Hi All,
>
> I'm attempting to paralellize some Rcpp code I have now, but I'm having a
> hard time understanding the best route for this. I'm looking for some
> guidance about how things need to be restructured to take advantage of the
> parallelFor (if that would be the speediest option).
>
> I have the following two functions:
>
> double kl_divergence(NumericVector vec1, NumericVector vec2)
> NumericMatrix js_distance(NumericMatrix mat)
>
> Right now, the code for js_distance is:
>
> NumericMatrix rmat(mat.nrow(), mat.nrow());
>   for (int i = 0; i < rmat.nrow(); i++) {
>     for (int j = 0; j < i; j++) {
>       NumericVector avg = (mat(i,_) + mat(j,_))/2;
>       double d1 = kl_divergence(mat(i,_), avg);
>       double d2 = kl_divergence(mat(j,_), avg);
>       rmat(i,j) = sqrt(.5 * (d1 + d2));
>     }
>   }
>
> Which, by the way, is amazingly short and nice. I wanted to parallelize
> the outer loop, but I'm not finding a mechanism to use iterators to go over
> the rows of mat. I've looked at the examples on RcppParallel, but
>
>
>
>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20140711/127f5025/attachment.html>


More information about the Rcpp-devel mailing list