<div dir="ltr"><div>Hi Dirk,</div><div><br></div>sessionInfo() was the right clue. Indeed the version of R on machine B was not linked to OpenBLAS. Switching to a version with OpenBLAS allows the test code to use all cores.<div><br></div><div>A clear way to check which library is linked is to run the following:</div><div><br></div><div>> extSoftVersion()["BLAS"]<br></div><div><br></div><div>Thanks for your help!</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Feb 24, 2024 at 9:17 AM Dirk Eddelbuettel <<a href="mailto:edd@debian.org">edd@debian.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
On 24 February 2024 at 11:44, Robin Liu wrote:<br>
| Thank you Dirk for the response.<br>
| <br>
| I called RcppArmadillo::armadillo_get_number_of_omp_threads() on both machines<br>
| and correctly see that machine A and B have 20 and 40 cores, respectively. I<br>
| also see that calling the setter changes this value.<br>
| <br>
| However, calling the setter does not seem to change the number of cores used on<br>
| either machine A or B. I have updated my code example as below: the execution<br>
| uses 20 cores on machine A and 1 core on machine B as before, despite my<br>
| setting the number of omp threads to 5. Do you have any further hints?<br>
<br>
I fear you need to debug that on the machine 'B' in question. It's all open<br>
source. I do not think either Conrad or myself put code in to constrain you<br>
to one core on 'B' (and then doesn't as you see on 'A').<br>
<br>
You can grep around both the RcppArmadillo wrapper code and the include<br>
Armadillo code, I suggest making a local copy and peppering in some print<br>
statements.<br>
<br>
Also keep in mind that (Rcpp)Armadillo hands off to computation to the actual<br>
LAPACK / BLAS implementation on that machine. Lots of things can go wrong<br>
there: maybe R was compiled with its own embedded BLAS/LAPACK sources<br>
(preventing a call out to OpenBLAS even when the machine has it). Or maybe R<br>
was compiled correctly but a single-threaded set of libraries is on the<br>
machine.<br>
<br>
You have not supplied any of that information. Many bug report suggestions<br>
hint that showing `sessionInfo()` helps -- and it does show the BLAS/LAPACK<br>
libraries. You are not forced to show us this, but by not showing us you<br>
prevent us from being more focussed on suggestions. So maybe start at your<br>
end by glancing at sessionInfo() on A and B?<br>
<br>
Dirk<br>
<br>
<br>
| library(RcppArmadillo)<br>
| library(Rcpp)<br>
| <br>
| RcppArmadillo::armadillo_set_number_of_omp_threads(5)<br>
| print(sprintf("There are %d threads",<br>
| RcppArmadillo::armadillo_get_number_of_omp_threads()))<br>
| <br>
| src <-<br>
| r"(#include <RcppArmadillo.h><br>
| <br>
| // [[Rcpp::depends(RcppArmadillo)]]<br>
| <br>
| // [[Rcpp::export]]<br>
| arma::vec getEigenValues(arma::mat M) {<br>
| return arma::eig_sym(M);<br>
| })"<br>
| <br>
| size <- 10000<br>
| m <- matrix(rnorm(size^2), size, size)<br>
| m <- m * t(m)<br>
| <br>
| # This line compiles the above code with the -fopenmp flag.<br>
| sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)<br>
| result <- getEigenValues(m)<br>
| print(result[1:10])<br>
| <br>
| On Fri, Feb 23, 2024 at 12:53 PM Dirk Eddelbuettel <<a href="mailto:edd@debian.org" target="_blank">edd@debian.org</a>> wrote:<br>
| <br>
| <br>
| On 23 February 2024 at 09:35, Robin Liu wrote:<br>
| | Hi all,<br>
| |<br>
| | Here is an R script that uses Armadillo to decompose a large matrix and<br>
| print<br>
| | the first 10 eigenvalues.<br>
| |<br>
| | library(RcppArmadillo)<br>
| | library(Rcpp)<br>
| |<br>
| | src <-<br>
| | r"(#include <RcppArmadillo.h><br>
| |<br>
| | // [[Rcpp::depends(RcppArmadillo)]]<br>
| |<br>
| | // [[Rcpp::export]]<br>
| | arma::vec getEigenValues(arma::mat M) {<br>
| | return arma::eig_sym(M);<br>
| | })"<br>
| |<br>
| | size <- 10000<br>
| | m <- matrix(rnorm(size^2), size, size)<br>
| | m <- m * t(m)<br>
| |<br>
| | # This line compiles the above code with the -fopenmp flag.<br>
| | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)<br>
| | result <- getEigenValues(m)<br>
| | print(result[1:10])<br>
| |<br>
| | When I run this code on server A, I see that arma can implicitly leverage<br>
| all<br>
| | available cores by running top -H. However, on server B it can only use<br>
| one<br>
| | core despite multiple being available: there is just one process entry in<br>
| top<br>
| | -H. Both processes successfully exit and return an answer. The process on<br>
| | server B is of course much slower.<br>
| <br>
| It is documented in the package how this is applied and the policy is to<br>
| NOT<br>
| blindly enforce one use case (say all cores, or half, or a magically chosen<br>
| value of N for whatever value of N) but to follow the local admin setting<br>
| and<br>
| respecting standard environment variables.<br>
| <br>
| So I suspect that your machine 'B' differs from machine 'A' in this<br>
| regards.<br>
| <br>
| Not that this is a _run-time_ and not _compile-time_ behavior. As it is for<br>
| multicore-enabled LAPACK and BLAS libraries, the OpenMP library and<br>
| basically<br>
| most software of this type.<br>
| <br>
| You can override it, see<br>
| RcppArmadillo::armadillo_set_number_of_omp_threads<br>
| RcppArmadillo::armadillo_get_number_of_omp_threads<br>
| <br>
| Can you try and see if these help you?<br>
| <br>
| Dirk<br>
| <br>
| | Here is the compilation on server A:<br>
| | /usr/local/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'<br>
| | 'file197c21cbec564.cpp'<br>
| | g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I../inst/include<br>
| | -fopenmp -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/<br>
| lib/R/<br>
| | site-library/RcppArmadillo/include" -I"/tmp/RtmpwhGRi3/<br>
| | sourceCpp-x86_64-pc-linux-gnu-1.0.9" -I/usr/local/include -fpic -g -O2<br>
| | -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time<br>
| | -D_FORTIFY_SOURCE=2 -g -c file197c21cbec564.cpp -o file197c21cbec564.o<br>
| | g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o<br>
| | sourceCpp_2.so file197c21cbec564.o -fopenmp -llapack -lblas -lgfortran<br>
| -lm<br>
| | -lquadmath -L/usr/local/lib/R/lib -lR<br>
| |<br>
| | and here it is for server B:<br>
| | /sw/R/R-4.2.3/lib64/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'<br>
| | 'file158165b9c4ae1.cpp'<br>
| | g++ -std=gnu++11 -I"/sw/R/R-4.2.3/lib64/R/include" -DNDEBUG -I../inst/<br>
| include<br>
| | -fopenmp -I"/home/my_username/.R/library/Rcpp/include" -I"/home/<br>
| my_username<br>
| | /.R/library/RcppArmadillo/include" -I"/tmp/RtmpvfPt4l/<br>
| | sourceCpp-x86_64-pc-linux-gnu-1.0.10" -I/usr/local/include -fpic -g<br>
| -O2 -c<br>
| | file158165b9c4ae1.cpp -o file158165b9c4ae1.o<br>
| | g++ -std=gnu++11 -shared -L/sw/R/R-4.2.3/lib64/R/lib -L/usr/local/lib64<br>
| -o<br>
| | sourceCpp_2.so file158165b9c4ae1.o -fopenmp -llapack -lblas -lgfortran<br>
| -lm<br>
| | -lquadmath -L/sw/R/R-4.2.3/lib64/R/lib -lR<br>
| |<br>
| | I thought that the -fopenmp flag should let arma implicitly parallelize<br>
| matrix<br>
| | computations. Any hints as to why this may not work on server B?<br>
| |<br>
| | The actual code I'm running is an R package that includes RcppArmadillo<br>
| and<br>
| | RcppEnsmallen. Server B is the login node to an hpc cluster, but the code<br>
| does<br>
| | not use all cores on the compute nodes either.<br>
| |<br>
| | Best,<br>
| | Robin<br>
| | _______________________________________________<br>
| | Rcpp-devel mailing list<br>
| | <a href="mailto:Rcpp-devel@lists.r-forge.r-project.org" target="_blank">Rcpp-devel@lists.r-forge.r-project.org</a><br>
| | <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" rel="noreferrer" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a><br>
| <br>
| --<br>
| <a href="http://dirk.eddelbuettel.com" rel="noreferrer" target="_blank">dirk.eddelbuettel.com</a> | @eddelbuettel | <a href="mailto:edd@debian.org" target="_blank">edd@debian.org</a><br>
| <br>
<br>
-- <br>
<a href="http://dirk.eddelbuettel.com" rel="noreferrer" target="_blank">dirk.eddelbuettel.com</a> | @eddelbuettel | <a href="mailto:edd@debian.org" target="_blank">edd@debian.org</a><br>
</blockquote></div>