[Rcpp-devel] Conversion of Rcpp::Vector Rcpp::List and Rcpp::Matrix to std objects - OpenMP

Simon Zehnder szehnder at uni-bonn.de
Sun Jun 2 17:09:08 CEST 2013

Hi Asis,

I have cloned the git project and successfully installed it to R. The 'testCptsRealLive.R' does not work though. The 'load' command has problems reading the binary file. 

Nevertheless, I took a look at your OpenMP code: 

SEXP conditionalProbabilityTables( SEXP uniqueEdgeLengths, SEXP annos, SEXP
    stringifiedAnnotations, SEXP annotsMutationProbTableList, SEXP
    mutTblLengthColIndx, SEXP nThreads ) {

    //std::cout << "1" << "\n";
    NumericVector numberThreads = NumericVector( nThreads );
    //std::cout << "2" << "\n";
    omp_set_num_threads( numberThreads(0) );
    //std::cout << "3" << "\n";

    NumericVector edgeLengths = NumericVector( uniqueEdgeLengths );
    //std::cout << "4" << "\n";
    CharacterVector edgeLengthsAsStrs = as<CharacterVector>( edgeLengths );
    //std::cout << "5" << "\n";
    List cpts = List( 0 );
    /* std::map< std::string, std::vector<double> > cpts; */
    //std::cout << "6" << "\n";

    #pragma omp parallel for private( uniqueEdgeLengths, annos, stringifiedAnnotations, annotsMutationProbTableList, mutTblLengthColIndx, nThreads, edgeLengths ) shared( cpts )
    for ( int i = 0; i < edgeLengths.size(); i++ )   
      NumericVector currBranchLength( 1, edgeLengths( i ) );
      NumericMatrix cpt = conditionalProbabilityTable( currBranchLength, annos,
          stringifiedAnnotations, annotsMutationProbTableList,
          mutTblLengthColIndx );
      cpts.push_back( cpt, std::string( edgeLengthsAsStrs( i ) ) );
      /* cpts[ std::string( edgeLengthsAsStrs( i ) ) ] = as<std::vector<double> >( cpt ); */
    return( wrap( cpts ) );


My comments: 
1) Set always '++i' not 'i++'. The second directive involves a copy, whereas the first one doesn't. This makes the first one faster. 

2) You are using a function 'conditionalProbabilityTable' in your OpenMP-Loop. Have you tested this function?

3) The 'cpts' List-Object is shared in the loop, i.e. all threads can access it. Then you write-access the list not by index or name, but by push_back(). If this is thread-safe I cannot tell you, but I wouldn't use it here. 
This can cause undefined behavior, i.e. your program runs sometimes without errors and another time you get a segfault. Try to access the list at least with an index or name. Another possibility: Use the push_back
command in a '#pragma omp critical' directive, which allows only one thread at a time access to the List (this can cause a longer runtime). 

4) You are using in the called function 'conditionalProbabilityTable' another for-loop: 

SEXP conditionalProbabilityTable( SEXP branchLength, SEXP annos,
    SEXP stringifiedAnnotations, SEXP annotsMutationProbTables, SEXP
    mutTblLengthColIndx ) {

    List annosLst( annos );
    CharacterVector annosAsStrs( stringifiedAnnotations );
    NumericMatrix cpt = NumericMatrix( annosLst.size(), annosLst.size() );
    cpt.attr( "dimnames" ) = List::create( annosAsStrs, annosAsStrs );

    for ( int i = 0; i < annosLst.size(); i++ ) {
      CharacterVector compositeAnnotation = annosLst( i );
      double compAnnoMutProb = 1.0;
      std::string ua = "unknown";
      std::string caFirst = as<std::string>( compositeAnnotation( 0 ) );
      if ( ua != caFirst ) {
        compAnnoMutProb = mutationProbability( compositeAnnotation,
            branchLength, annotsMutationProbTables, mutTblLengthColIndx )( 0 );
      double mutToOtherAnnosProb = compAnnoMutProb / ( annosLst.size() - 1 );
      NumericVector colmn( annosLst.size(), mutToOtherAnnosProb );
      colmn( i ) = 1.0 - compAnnoMutProb;
      cpt( _, i ) = colmn;

    return( wrap( cpt ) );

In such cases, you should always think about which loop to parallelize. Usually you parallelize the loop with the bigger workload. Only in case this loop has only
a few iterations it would make sense to parallelize the other loop. Another possibility: Create nested parallelized loops (advanced-level). For this you just use another '#pragma omp for' directive before the second for-loop. Set the variables OMP_NESTED to true. 
Set OMP_MAX_ACTIVE_LEVELS=2. Further make sure, that you have enough stack size for the threads: OMP_STACKSIZE=1000000. 
Do not even dream about an improvement, when working with two cores! You need at least 8 to see the work power of nesting!

5) Set the environment variable OMP_DYNAMIC=false. The dynamic chunking does in most cases not work and static chunking has a better performance. 

6) The OpenMP library was intended to be used on very simple objects, i.e. arrays and vectors. One reason for this is also the socalled page placing, where you place the memory chunks, which are used by a certain core
on the local memory of this core (NUMA-technology). If we have very complicated objects, it is not always clear, how the memory can be placed for an optimal access. 
So, to cut a long story short, I would always use very simple objects in an OpenMP directive. As NumericVectors and Armadillo-Vectors rely on an inner array for memory allocations 
(e.g. you can initialize both objects with an already existing stl::vector for memory reutilization), this objects can still be considered as very simple -> fast. Second, page placement
is more easily achieved, if wanted/needed. Third, write-access can be done easily by index and is thread-safe if indices do not intersect or we have read-access only. 
Run over vectors and matrices and stack them later on into a list. 

7) It seems sometimes to me, that you create a lot of objects. Check if these objects are all needed. For example, consider in ''conditionalProbabilityTable':
colmn( i ) = 1.0 - compAnnoMutProb;
cpt( _, i ) = colmn;
I think, you could write this immediately into 'cpt'?

8) Use instruments for testing: A very good introduction and an overview can be found on the website of the HPC-Cluster in Aachen (to which by the way you could get access from the University Bonn), http://www.rz.rwth-aachen.de/global/show_document.asp?id=aaaaaaaaaaclexs. Identify with an instrument the 
big workload in your sequential program and try to parallelize this section. Then check another time where another workload is, and so on. I would further suggest to simplify your program, when you 
encounter undefined behavior. This helps in most cases a lot. 

I hope I could help here. I already saw your new thread on the list and it seems, everything is running. So congrats and keep on!



P.S. Of course we could meet in Bonn or Cologne (where I am living and working now). I am also very interested, what is needed in Bioinformatics, maybe we could interchange ideas. Let us keep in contact!

On Jun 1, 2013, at 8:53 PM, Asis Hallab <asis.hallab at gmail.com> wrote:

> Dear Simon,
> thank you very much for your insights and offer to help.
> I prepared a very small project on github:
> https://github.com/asishallab/PhyloFun_Rccp
> You'll find installation and test instructions as well as references
> to the file containing the OpenMP using C++ code.
> Your help is much appreciated.
> Kind regards!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130602/4563fe8c/attachment-0001.html>

More information about the Rcpp-devel mailing list