[Rcpp-devel] RcppArmadillo with -fopenmp: Not using all available cores

Robin Liu robin28liu at gmail.com
Fri Feb 23 18:35:56 CET 2024


Hi all,

Here is an R script that uses Armadillo to decompose a large matrix and
print the first 10 eigenvalues.

library(RcppArmadillo)
library(Rcpp)

src <-
r"(#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::vec getEigenValues(arma::mat M) {
  return arma::eig_sym(M);
})"

size <- 10000
m <- matrix(rnorm(size^2), size, size)
m <- m * t(m)

# This line compiles the above code with the -fopenmp flag.
sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
result <- getEigenValues(m)
print(result[1:10])

When I run this code on server A, I see that arma can implicitly leverage
all available cores by running top -H. However, on server B it can only use
one core despite multiple being available: there is just one process entry
in top -H. Both processes successfully exit and return an answer. The
process on server B is of course much slower.

Here is the compilation on server A:
/usr/local/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'
'file197c21cbec564.cpp'
g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I../inst/include
-fopenmp  -I"/usr/local/lib/R/site-library/Rcpp/include"
-I"/usr/local/lib/R/site-library/RcppArmadillo/include"
-I"/tmp/RtmpwhGRi3/sourceCpp-x86_64-pc-linux-gnu-1.0.9"
-I/usr/local/include   -fpic  -g -O2 -fstack-protector-strong -Wformat
-Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c
file197c21cbec564.cpp -o file197c21cbec564.o
g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o
sourceCpp_2.so file197c21cbec564.o -fopenmp -llapack -lblas -lgfortran -lm
-lquadmath -L/usr/local/lib/R/lib -lR

and here it is for server B:
/sw/R/R-4.2.3/lib64/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'
'file158165b9c4ae1.cpp'
g++ -std=gnu++11 -I"/sw/R/R-4.2.3/lib64/R/include" -DNDEBUG
-I../inst/include -fopenmp  -I"/home/my_username/.R/library/Rcpp/include"
-I"/home/ my_username/.R/library/RcppArmadillo/include"
-I"/tmp/RtmpvfPt4l/sourceCpp-x86_64-pc-linux-gnu-1.0.10"
-I/usr/local/include   -fpic  -g -O2  -c file158165b9c4ae1.cpp -o
file158165b9c4ae1.o
g++ -std=gnu++11 -shared -L/sw/R/R-4.2.3/lib64/R/lib -L/usr/local/lib64 -o
sourceCpp_2.so file158165b9c4ae1.o -fopenmp -llapack -lblas -lgfortran -lm
-lquadmath -L/sw/R/R-4.2.3/lib64/R/lib -lR

I thought that the -fopenmp flag should let arma implicitly parallelize
matrix computations. Any hints as to why this may not work on server B?

The actual code I'm running is an R package that includes RcppArmadillo and
RcppEnsmallen. Server B is the login node to an hpc cluster, but the code
does not use all cores on the compute nodes either.

Best,
Robin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20240223/71f2fc78/attachment.htm>


More information about the Rcpp-devel mailing list