[Rcpp-devel] Conversion of Rcpp::Vector Rcpp::List and Rcpp::Matrix to std objects - OpenMP

Asis Hallab asis.hallab at gmail.com
Mon Jun 3 17:02:51 CEST 2013


Dear Dirk, Simon and Rcpp Experts.

This is a message following up the thread about using OpenMP
directives with Rcpp to construct probability matrices in parallel.
I send it again to properly close this thread (for now…)

I followed Dirk's hint and implemented the parallel matrix generation
using just C++'s STL and the "#pragma omp parallel for" for the loop
of the heaviest work load in each iteration, that is the generation of
a matrix.

Good news: The code compiles and runs without errors.

Bad news: Even though the conversion of a large RcppList and its
contained NumericMatrix objects does only take less then half a
second, the parallel code with 10 cores runs approximately 10 times
slower than the serial pure Rcpp implementation.

Serial implementation
 user  system elapsed
  9.657   0.100   9.785

Parallel implementation on 10 cores
   user  system elapsed
443.095  26.437 100.132

Parallel implementation on 20 cores
   user  system elapsed
719.173  35.418  85.663

Again: I measured the time required to convert the Rcpp objects and
this is only half a second.
Back conversion I did not even implement yet, I just wrap the
resulting std::map< std string, std::vector< std::vector<double> >.

Does anyone have an idea what is going on?

The code can be reviewed on github:
https://github.com/asishallab/PhyloFun_Rccp/blob/OpenMP

You'll find very short installation and test run instructions in the
README.textile.

Kind regards and all the best!

2013/6/2 Simon Zehnder <szehnder at uni-bonn.de>:
> Hi Asis,
>
> I have cloned the git project and successfully installed it to R. The
> 'testCptsRealLive.R' does not work though. The 'load' command has problems
> reading the binary file.
>
> Nevertheless, I took a look at your OpenMP code:
>
> SEXP conditionalProbabilityTables( SEXP uniqueEdgeLengths, SEXP annos, SEXP
>     stringifiedAnnotations, SEXP annotsMutationProbTableList, SEXP
>     mutTblLengthColIndx, SEXP nThreads ) {
>   BEGIN_RCPP
>
>     //std::cout << "1" << "\n";
>     NumericVector numberThreads = NumericVector( nThreads );
>     //std::cout << "2" << "\n";
>     omp_set_num_threads( numberThreads(0) );
>     //std::cout << "3" << "\n";
>
>     NumericVector edgeLengths = NumericVector( uniqueEdgeLengths );
>     //std::cout << "4" << "\n";
>     CharacterVector edgeLengthsAsStrs = as<CharacterVector>( edgeLengths );
>     //std::cout << "5" << "\n";
>     List cpts = List( 0 );
>     /* std::map< std::string, std::vector<double> > cpts; */
>     //std::cout << "6" << "\n";
>
>     #pragma omp parallel for private( uniqueEdgeLengths, annos,
> stringifiedAnnotations, annotsMutationProbTableList, mutTblLengthColIndx,
> nThreads, edgeLengths ) shared( cpts )
>     for ( int i = 0; i < edgeLengths.size(); i++ )
>     {
>       NumericVector currBranchLength( 1, edgeLengths( i ) );
>       NumericMatrix cpt = conditionalProbabilityTable( currBranchLength,
> annos,
>           stringifiedAnnotations, annotsMutationProbTableList,
>           mutTblLengthColIndx );
>       cpts.push_back( cpt, std::string( edgeLengthsAsStrs( i ) ) );
>       /* cpts[ std::string( edgeLengthsAsStrs( i ) ) ] =
> as<std::vector<double> >( cpt ); */
>     }
>     return( wrap( cpts ) );
>
>   END_RCPP
> }
>
> My comments:
> 1) Set always '++i' not 'i++'. The second directive involves a copy, whereas
> the first one doesn't. This makes the first one faster.
>
> 2) You are using a function 'conditionalProbabilityTable' in your
> OpenMP-Loop. Have you tested this function?
>
> 3) The 'cpts' List-Object is shared in the loop, i.e. all threads can access
> it. Then you write-access the list not by index or name, but by push_back().
> If this is thread-safe I cannot tell you, but I wouldn't use it here.
> This can cause undefined behavior, i.e. your program runs sometimes without
> errors and another time you get a segfault. Try to access the list at least
> with an index or name. Another possibility: Use the push_back
> command in a '#pragma omp critical' directive, which allows only one thread
> at a time access to the List (this can cause a longer runtime).
>
> 4) You are using in the called function 'conditionalProbabilityTable'
> another for-loop:
>
> SEXP conditionalProbabilityTable( SEXP branchLength, SEXP annos,
>     SEXP stringifiedAnnotations, SEXP annotsMutationProbTables, SEXP
>     mutTblLengthColIndx ) {
>   BEGIN_RCPP
>
>     List annosLst( annos );
>     CharacterVector annosAsStrs( stringifiedAnnotations );
>     NumericMatrix cpt = NumericMatrix( annosLst.size(), annosLst.size() );
>     cpt.attr( "dimnames" ) = List::create( annosAsStrs, annosAsStrs );
>
>     for ( int i = 0; i < annosLst.size(); i++ ) {
>       CharacterVector compositeAnnotation = annosLst( i );
>       double compAnnoMutProb = 1.0;
>       std::string ua = "unknown";
>       std::string caFirst = as<std::string>( compositeAnnotation( 0 ) );
>       if ( ua != caFirst ) {
>         compAnnoMutProb = mutationProbability( compositeAnnotation,
>             branchLength, annotsMutationProbTables, mutTblLengthColIndx )( 0
> );
>       }
>       double mutToOtherAnnosProb = compAnnoMutProb / ( annosLst.size() - 1
> );
>       NumericVector colmn( annosLst.size(), mutToOtherAnnosProb );
>       colmn( i ) = 1.0 - compAnnoMutProb;
>       cpt( _, i ) = colmn;
>     }
>
>     return( wrap( cpt ) );
>
>   END_RCPP
> }
> In such cases, you should always think about which loop to parallelize.
> Usually you parallelize the loop with the bigger workload. Only in case this
> loop has only
> a few iterations it would make sense to parallelize the other loop. Another
> possibility: Create nested parallelized loops (advanced-level). For this you
> just use another '#pragma omp for' directive before the second for-loop. Set
> the variables OMP_NESTED to true.
> Set OMP_MAX_ACTIVE_LEVELS=2. Further make sure, that you have enough stack
> size for the threads: OMP_STACKSIZE=1000000.
> Do not even dream about an improvement, when working with two cores! You
> need at least 8 to see the work power of nesting!
>
> 5) Set the environment variable OMP_DYNAMIC=false. The dynamic chunking does
> in most cases not work and static chunking has a better performance.
>
> 6) The OpenMP library was intended to be used on very simple objects, i.e.
> arrays and vectors. One reason for this is also the socalled page placing,
> where you place the memory chunks, which are used by a certain core
> on the local memory of this core (NUMA-technology). If we have very
> complicated objects, it is not always clear, how the memory can be placed
> for an optimal access.
> So, to cut a long story short, I would always use very simple objects in an
> OpenMP directive. As NumericVectors and Armadillo-Vectors rely on an inner
> array for memory allocations
> (e.g. you can initialize both objects with an already existing stl::vector
> for memory reutilization), this objects can still be considered as very
> simple -> fast. Second, page placement
> is more easily achieved, if wanted/needed. Third, write-access can be done
> easily by index and is thread-safe if indices do not intersect or we have
> read-access only.
> Run over vectors and matrices and stack them later on into a list.
>
> 7) It seems sometimes to me, that you create a lot of objects. Check if
> these objects are all needed. For example, consider in
> ''conditionalProbabilityTable':
> colmn( i ) = 1.0 - compAnnoMutProb;
> cpt( _, i ) = colmn;
> I think, you could write this immediately into 'cpt'?
>
> 8) Use instruments for testing: A very good introduction and an overview can
> be found on the website of the HPC-Cluster in Aachen (to which by the way
> you could get access from the University Bonn),
> http://www.rz.rwth-aachen.de/global/show_document.asp?id=aaaaaaaaaaclexs.
> Identify with an instrument the
> big workload in your sequential program and try to parallelize this section.
> Then check another time where another workload is, and so on. I would
> further suggest to simplify your program, when you
> encounter undefined behavior. This helps in most cases a lot.
>
> I hope I could help here. I already saw your new thread on the list and it
> seems, everything is running. So congrats and keep on!
>
>
> Best
>
> Simon
>
>
> On Jun 1, 2013, at 8:53 PM, Asis Hallab <asis.hallab at gmail.com> wrote:
>
> Dear Simon,
>
> thank you very much for your insights and offer to help.
>
> I prepared a very small project on github:
> https://github.com/asishallab/PhyloFun_Rccp
>
> You'll find installation and test instructions as well as references
> to the file containing the OpenMP using C++ code.
>
> Your help is much appreciated.
>
> Kind regards!
>
>


More information about the Rcpp-devel mailing list