<div dir="ltr">Hi all,<div><br></div><div>Here is an R script that uses Armadillo to decompose a large matrix and print the first 10 eigenvalues.<br><br></div><div><font face="monospace">library(RcppArmadillo)<br>library(Rcpp)<br><br>src <-<br>r"(#include <RcppArmadillo.h><br><br>// [[Rcpp::depends(RcppArmadillo)]]<br><br>// [[Rcpp::export]]<br>arma::vec getEigenValues(arma::mat M) {<br>  return arma::eig_sym(M);<br>})"<br><br>size <- 10000<br>m <- matrix(rnorm(size^2), size, size)<br>m <- m * t(m)<br><br># This line compiles the above code with the -fopenmp flag.<br>sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)<br>result <- getEigenValues(m)<br>print(result[1:10])</font><br></div><div><br></div><div>When I run this code on server A, I see that arma can implicitly leverage all available cores by running top -H. However, on server B it can only use one core despite multiple being available: there is just one process entry in top -H. Both processes successfully exit and return an answer. The process on server B is of course much slower.</div><div><br></div><div>Here is the compilation on server A:<br><font face="monospace">/usr/local/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so' 'file197c21cbec564.cpp'<br>g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I../inst/include -fopenmp  -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/lib/R/site-library/RcppArmadillo/include" -I"/tmp/RtmpwhGRi3/sourceCpp-x86_64-pc-linux-gnu-1.0.9" -I/usr/local/include   -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c file197c21cbec564.cpp -o file197c21cbec564.o<br>g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o sourceCpp_2.so file197c21cbec564.o -fopenmp -llapack -lblas -lgfortran -lm -lquadmath -L/usr/local/lib/R/lib -lR<br></font></div><div><br></div><div>and here it is for server B:</div><div><font face="monospace">/sw/R/R-4.2.3/lib64/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so' 'file158165b9c4ae1.cpp'<br>g++ -std=gnu++11 -I"/sw/R/R-4.2.3/lib64/R/include" -DNDEBUG -I../inst/include -fopenmp  -I"/home/my_username/.R/library/Rcpp/include" -I"/home/</font> <span style="font-family:monospace">my_username</span><font face="monospace">/.R/library/RcppArmadillo/include" -I"/tmp/RtmpvfPt4l/sourceCpp-x86_64-pc-linux-gnu-1.0.10" -I/usr/local/include   -fpic  -g -O2  -c file158165b9c4ae1.cpp -o file158165b9c4ae1.o<br>g++ -std=gnu++11 -shared -L/sw/R/R-4.2.3/lib64/R/lib -L/usr/local/lib64 -o sourceCpp_2.so file158165b9c4ae1.o -fopenmp -llapack -lblas -lgfortran -lm -lquadmath -L/sw/R/R-4.2.3/lib64/R/lib -lR<br></font></div><div><br></div><div>I thought that the -fopenmp flag should let arma implicitly parallelize matrix computations. Any hints as to why this may not work on server B?</div><div><br></div><div>The actual code I'm running is an R package that includes RcppArmadillo and RcppEnsmallen. Server B is the login node to an hpc cluster, but the code does not use all cores on the compute nodes either.</div><div><br></div><div>Best,</div><div>Robin</div></div>