[Rcpp-devel] RcppArmadillo with -fopenmp: Not using all available cores

Robin Liu robin28liu at gmail.com
Sat Feb 24 17:44:59 CET 2024


Thank you Dirk for the response.

I called RcppArmadillo::armadillo_get_number_of_omp_threads() on both
machines and correctly see that machine A and B have 20 and 40 cores,
respectively. I also see that calling the setter changes this value.

However, calling the setter does not seem to change the number of cores
used on either machine A or B. I have updated my code example as below: the
execution uses 20 cores on machine A and 1 core on machine B as before,
despite my setting the number of omp threads to 5. Do you have any further
hints?

library(RcppArmadillo)
library(Rcpp)

RcppArmadillo::armadillo_set_number_of_omp_threads(5)
print(sprintf("There are %d threads",
      RcppArmadillo::armadillo_get_number_of_omp_threads()))

src <-
r"(#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::vec getEigenValues(arma::mat M) {
  return arma::eig_sym(M);
})"

size <- 10000
m <- matrix(rnorm(size^2), size, size)
m <- m * t(m)

# This line compiles the above code with the -fopenmp flag.
sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
result <- getEigenValues(m)
print(result[1:10])

On Fri, Feb 23, 2024 at 12:53 PM Dirk Eddelbuettel <edd at debian.org> wrote:

>
> On 23 February 2024 at 09:35, Robin Liu wrote:
> | Hi all,
> |
> | Here is an R script that uses Armadillo to decompose a large matrix and
> print
> | the first 10 eigenvalues.
> |
> | library(RcppArmadillo)
> | library(Rcpp)
> |
> | src <-
> | r"(#include <RcppArmadillo.h>
> |
> | // [[Rcpp::depends(RcppArmadillo)]]
> |
> | // [[Rcpp::export]]
> | arma::vec getEigenValues(arma::mat M) {
> |   return arma::eig_sym(M);
> | })"
> |
> | size <- 10000
> | m <- matrix(rnorm(size^2), size, size)
> | m <- m * t(m)
> |
> | # This line compiles the above code with the -fopenmp flag.
> | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
> | result <- getEigenValues(m)
> | print(result[1:10])
> |
> | When I run this code on server A, I see that arma can implicitly
> leverage all
> | available cores by running top -H. However, on server B it can only use
> one
> | core despite multiple being available: there is just one process entry
> in top
> | -H. Both processes successfully exit and return an answer. The process on
> | server B is of course much slower.
>
> It is documented in the package how this is applied and the policy is to
> NOT
> blindly enforce one use case (say all cores, or half, or a magically chosen
> value of N for whatever value of N) but to follow the local admin setting
> and
> respecting standard environment variables.
>
> So I suspect that your machine 'B' differs from machine 'A' in this
> regards.
>
> Not that this is a _run-time_ and not _compile-time_ behavior. As it is for
> multicore-enabled LAPACK and BLAS libraries, the OpenMP library and
> basically
> most software of this type.
>
> You can override it, see
>   RcppArmadillo::armadillo_set_number_of_omp_threads
>   RcppArmadillo::armadillo_get_number_of_omp_threads
>
> Can you try and see if these help you?
>
> Dirk
>
> | Here is the compilation on server A:
> | /usr/local/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'
> | 'file197c21cbec564.cpp'
> | g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I../inst/include
> | -fopenmp  -I"/usr/local/lib/R/site-library/Rcpp/include"
> -I"/usr/local/lib/R/
> | site-library/RcppArmadillo/include" -I"/tmp/RtmpwhGRi3/
> | sourceCpp-x86_64-pc-linux-gnu-1.0.9" -I/usr/local/include   -fpic  -g -O2
> | -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time
> | -D_FORTIFY_SOURCE=2 -g  -c file197c21cbec564.cpp -o file197c21cbec564.o
> | g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o
> | sourceCpp_2.so file197c21cbec564.o -fopenmp -llapack -lblas -lgfortran
> -lm
> | -lquadmath -L/usr/local/lib/R/lib -lR
> |
> | and here it is for server B:
> | /sw/R/R-4.2.3/lib64/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'
> | 'file158165b9c4ae1.cpp'
> | g++ -std=gnu++11 -I"/sw/R/R-4.2.3/lib64/R/include" -DNDEBUG
> -I../inst/include
> | -fopenmp  -I"/home/my_username/.R/library/Rcpp/include"
> -I"/home/ my_username
> | /.R/library/RcppArmadillo/include" -I"/tmp/RtmpvfPt4l/
> | sourceCpp-x86_64-pc-linux-gnu-1.0.10" -I/usr/local/include   -fpic  -g
> -O2  -c
> | file158165b9c4ae1.cpp -o file158165b9c4ae1.o
> | g++ -std=gnu++11 -shared -L/sw/R/R-4.2.3/lib64/R/lib -L/usr/local/lib64
> -o
> | sourceCpp_2.so file158165b9c4ae1.o -fopenmp -llapack -lblas -lgfortran
> -lm
> | -lquadmath -L/sw/R/R-4.2.3/lib64/R/lib -lR
> |
> | I thought that the -fopenmp flag should let arma implicitly parallelize
> matrix
> | computations. Any hints as to why this may not work on server B?
> |
> | The actual code I'm running is an R package that includes RcppArmadillo
> and
> | RcppEnsmallen. Server B is the login node to an hpc cluster, but the
> code does
> | not use all cores on the compute nodes either.
> |
> | Best,
> | Robin
> | _______________________________________________
> | Rcpp-devel mailing list
> | Rcpp-devel at lists.r-forge.r-project.org
> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20240224/ee78331d/attachment.htm>


More information about the Rcpp-devel mailing list