[Rcpp-devel] R limiting the number of child threads
Yan Zhou
zhouyan at me.com
Mon Feb 4 20:33:05 CET 2013
Hi Dirk,
Many thanks for the reply.
Actually it turned out it is a problem that OpenMP assumes full control of the master thread (It appears I was not the only one bothered by such problem). I haven't figured out the details yet (and probably never will as this seems to be dependent on the compiler's OpenMP runtime). In summary, I tried the following experiment (each one was run from a cleanly started new R session)
1. run C++11 <thread> version // It appears 8 threads was used
2. run OpenMP version // It appears 8 threads was used
But if I reverse the order
1. Run OpenMP // 8 threads
2. Run C++11 // 2 Threads!
Replace the above OpenMP version algorithm with just a simple matrix multiplication, which calls dgemm, which use OpenMP, the same behavior was observed.
The only reason I can think of is that after whatever code that initialized OpenMP palatalization, OpenMP remains in control of the hardware in some way even it is no longer needed. And those resources cannot be used by C++11.
The problem of TBB use only 2 threads in previous email was an unrelated, stupid bug. After a fix it coexists with OpenMP happily ever after
After some googling and painful reading of the OpenMP standard, it seems to me that OpenMP is best used without other parallel programming exists in the same shared object. Others seems to have far worse situations than me when OpenMP and C++11 <thread> coexists. My guess is that Cilk, TBB are both Intel product, and the compiler was Intel and the OpenMP was intel openmp, and so they can coexists somehow in this particular case. But the C++11 run time was libstdc++ by GNU, and somehow they just don't like each other.
This is a really frustrating situation. We cannot avoid OpenMP, to do that we need to avoid most optimized BLAS. Yet OpenMP is not always the one we want to use in our own code. There are many things can be done easily with other programming models but not the primitive OpenMP. For one, std::async is quite handy from time to time.
Anyway, my original problem turns out to be unrelated to R or C++ at all.
Best,
Yan Zhou
On Feb 04, 2013, at 06:37 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
On 4 February 2013 at 17:47, Yan Zhou wrote:
| I have a C++ library for parallel implementation of Sequential Monte Carlo
| algorithms. It can use some different programming models, OpenMP, Intel TBB,
| Intel Cilk, C++11 <thread>, and others. Everything worked fine so far.
Wow. You're way ahead of me. With OpenMP, and noticed that it uses the same
environment variable that Intel MKL uses. I suspect you may find it (and the
related ones) useful for Cilk and TBB too. Copied straight from the
libgomp-4.7 docs (in package gcc-4.7-doc on my Ubuntu box):
3.4 `OMP_NUM_THREADS' - Specifies the number of threads to use
==============================================================
_Description_:
Specifies the default number of threads to use in parallel
regions. The value of this variable shall be a comma-separated
list of positive integers; the value specified the number of
threads to use for the corresponding nested level. If undefined
one thread per CPU is used.
_See also_:
*note omp_set_num_threads::
_Reference_:
OpenMP specifications v3.1 (http://www.openmp.org/), section 4.2
I seem to recall that the R package parallel uses the same variables, so if
you behaviour that differs between standalone C++ code and code called from
R, check there too.
Hope this helps, Dirk
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130204/3f7c467f/attachment-0001.html>
More information about the Rcpp-devel
mailing list