[Rcpp-devel] Assertion error in ud_ep.c, when running with MPI
Dirk Eddelbuettel
edd at debian.org
Tue Jun 14 13:43:25 CEST 2022
On 14 June 2022 at 09:38, Serguei Sokol wrote:
| Hi,
|
| Probably, this issue would be better posted here
| https://github.com/openucx/ucx/issues
Seconded. I don't have much to add: these MPI examples are ten years old,
worked all the time as far as we know, and haven't changed. Maybe some base
assumption in the libraries is different.
And if I may, I am not sure how much you are gaining here. You are fitting
each time series individually with R. That won't be fast. You will not gain
something -- but maybe not_that_ much -- by spreading MPI around it. If speed
is of the essence, maybe you need an equivalent of auto.arima in compiled code.
As a debugging measure, I'd also try to see if the issue goes away when you
call less demaning R code: maybe auto.arima leads to multithreaded code
inside each call which may upset the R stack called from MPI?
Also, rinside_mpi_sample4.cpp is (per the one-commit in its history) a
contributed example. Maybe its authors can help.
Sorry, no smoking gun or immediate help.
Dirk
| Best,
| Serguei.
|
| Le 14/06/2022 à 07:24, Maddegedara Lalith a écrit :
| > Hello,
| >
| > I want to use RInside in my C++ based MPI application to do time series
| > forecasting using the auto.arima library of R. The RInside instance in
| > each MPI rank is expected to do an independent calculation (e.g. time
| > series forecast).
| >
| > With one MPI rank, it always completes without producing any error.
| > However, with more than 1 mpi ranks, it produces the following error.
| > Depending on the run, different numbers of mpi ranks produce the same
| > error. On rare occasions, all the ranks successfully complete the
| > execution. Further, I found that even your example
| > "rinside_mpi_sample4.cpp" produces the same error.
| >
| > I am using the Intel MPI library (version 2021.1). I tried
| > compiling with icpc and g++. Both produced the same error.
| > Could you please help me to solve this problem.
| >
| > With best regards
| > Lal
| >
| > [ibis:14878:0:14992] ud_ep.c:565 Assertion `ep->dest_ep_id ==
| > UCT_UD_EP_NULL_ID || ep->dest_ep_id == ctl->conn_rep.src_ep_id' failed
| >
| > ==== backtrace (tid: 14994) ====
| > 0 0x000000000004d455 ucs_debug_print_backtrace() ???:0
| > 1 0x0000000000042b5f uct_ud_ep_process_rx() ???:0
| > 2 0x00000000000471cd uct_ud_mlx5_ep_t_delete() ???:0
| > 3 0x000000000003ebdf uct_ud_iface_release_desc() ???:0
| > 4 0x0000000000040436 ucs_cpu_get_memcpy_bw() ???:0
| > 5 0x000000000004050b ucs_cpu_get_memcpy_bw() ???:0
| > 6 0x0000000000041343 ucs_async_dispatch_handlers() ???:0
| > 7 0x0000000000041488 ucs_async_dispatch_timerq() ???:0
| > 8 0x0000000000043c34 ucs_async_pipe_drain() ???:0
| > 9 0x0000000000007ea5 start_thread() pthread_create.c:0
| > 10 0x00000000000fe96d __clone() ???:0
| > =================================
| >
| > _______________________________________________
| > Rcpp-devel mailing list
| > Rcpp-devel at lists.r-forge.r-project.org
| > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
|
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
More information about the Rcpp-devel
mailing list