[Rcpp-devel] Conversion of Rcpp::Vector Rcpp::List and Rcpp::Matrix to std objects - OpenMP

Simon Zehnder szehnder at uni-bonn.de
Mon Jun 3 15:22:30 CEST 2013


Hi Asis,

parallel computing is a very delicate task in programming, 
which depends on one side on your hardware architecture 
and on the other side on your commands in your software.

1. If a sequential code is faster than the parallel code, 
check if something is differently programmed or, if all is 
the same and only a '#pragma omp'-directive is added.

2. How many iterations are you working in parallel, as 
building a threadpool and initialize parallel processing 
costs time for the computer. If the time is more than the 
time you win from parallel execution of the tasks in the 
loop, you loose in aggregate. test your program by 
doubling your input: when does the parallel code bcomes 
faster then the sequential? If never, there must be a 
serious red flag somewhere in your code.

3. If you have a NUMA-architecture on your computer (which 
is the case for almost all modern home computers) and your 
parallelized tasks are defined by many momery accesses, 
there could be non-local memory accesses which are costly. 
In this case the only workaround is page processing, i.e. 
put the objects which are processed at a certain CPU into 
the local cache of this CPU.

4. In the case of too much cores for too less tasks, you 
get high generating costs (see point 2). Better try less 
threads. Furthermore: take as a maximum only the amount of 
cores you have. Hyperthreading is nice, but a real 
performance jump is only possible via real cores.

5. Set off the dynamic scheduling of OpenMP!! Set the 
environment variable via: OMP_DYNAMIC=false.

6. Look at the workload! If you are parallelizing the 
'wrong' loop it costs often more to parallelize something 
without much calculations the winning from doing it in 
parallel. Instead parallelize something with very complex 
calculations. Take a tool to monitor performance and the 
big workloads of your program. This can't be done by 
simply looking at the code - only if you also consider the 
specific hardware structure of your computer and using 
very simple objects.

For performance tools check out vampir 
(http://www.vampir.eu/) or scalasca 
(http://www.scalasca.org/). For debugging check valgrind 
(http://valgrind.org/docs/manual/drd-manual.html#drd-manual.openmp).

The commercial softwares are much better though. So, if 
you have access to this software I would suggest either 
Intel VTune Analyzer and Intel Inspector. Further 
especially for debugging TotalView.

Hope this helps


Best

Simon

On Mon, 3 Jun 2013 12:44:20 +0200
  Asis Hallab <asis.hallab at gmail.com> wrote:
> Dear Dirk, Simon and Rcpp Experts.
> 
> This is a message following up the thread about using 
>OpenMP
> directives with Rcpp to construct probability matrices 
>in parallel.
> 
> I followed Dirk's hint and implemented the parallel 
>matrix generation
> using just C++'s STL and the "#pragma omp parallel for" 
>for the loop
> of the heaviest work load in each iteration, that is the 
>generation of
> a matrix.
> 
> Good news: The code compiles and runs without errors.
> 
> Bad news: Even though the conversion of a large RcppList 
>and its
> contained NumericMatrix objects does only take less then 
>half a
> second, the parallel code with 10 cores runs 
>approximately 10 times
> slower than the serial pure Rcpp implementation.
> 
> Serial implementation
> user  system elapsed
>  9.657   0.100   9.785
> 
> Parallel implementation on 10 cores
>   user  system elapsed
> 443.095  26.437 100.132
> 
> Parallel implementation on 20 cores
>   user  system elapsed
> 719.173  35.418  85.663
> 
> Again: I measured the time required to convert the Rcpp 
>objects and
> this is only half a second.
> Back conversion I did not even implement yet, I just 
>wrap the
> resulting std::map< std string, std::vector< 
>std::vector<double> >.
> 
> Does anyone have an idea what is going on?
> 
> The code can be reviewed on github:
> https://github.com/asishallab/PhyloFun_Rccp/blob/OpenMP
> 
> You'll find very short installation and test run 
>instructions in the
> README.textile.
> 
> Kind regards and all the best!
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel



More information about the Rcpp-devel mailing list