[Rcpp-devel] Assertion error in ud_ep.c, when running with MPI

Dirk Eddelbuettel edd at debian.org
Tue Jun 14 13:43:25 CEST 2022


On 14 June 2022 at 09:38, Serguei Sokol wrote:
| Hi,
| 
| Probably, this issue would be better posted here 
| https://github.com/openucx/ucx/issues

Seconded. I don't have much to add: these MPI examples are ten years old,
worked all the time as far as we know, and haven't changed. Maybe some base
assumption in the libraries is different.

And if I may, I am not sure how much you are gaining here. You are fitting
each time series individually with R. That won't be fast.  You will not gain
something -- but maybe not_that_ much -- by spreading MPI around it. If speed
is of the essence, maybe you need an equivalent of auto.arima in compiled code.

As a debugging measure, I'd also try to see if the issue goes away when you
call less demaning R code: maybe auto.arima leads to multithreaded code
inside each call which may upset the R stack called from MPI?

Also, rinside_mpi_sample4.cpp is (per the one-commit in its history) a
contributed example. Maybe its authors can help.

Sorry, no smoking gun or immediate help.

Dirk

| Best,
| Serguei.
| 
| Le 14/06/2022 à 07:24, Maddegedara Lalith a écrit :
| > Hello,
| > 
| > I want to use RInside in my C++ based MPI application to do time series 
| > forecasting using the auto.arima library of R. The RInside instance in 
| > each MPI rank is expected to do an independent calculation (e.g. time 
| > series forecast).
| > 
| > With one MPI rank, it always completes without producing any error.  
| > However, with more than 1 mpi ranks, it produces the following error. 
| > Depending on the run, different numbers of mpi ranks produce the same 
| > error. On rare occasions, all the ranks successfully complete the 
| > execution. Further, I found that even your example 
| > "rinside_mpi_sample4.cpp" produces the same error.
| > 
| > I am using the Intel MPI library (version 2021.1). I tried 
| > compiling with icpc and g++. Both produced the same error.
| > Could you please help me to solve this problem.
| > 
| > With best regards
| > Lal
| > 
| > [ibis:14878:0:14992]       ud_ep.c:565  Assertion `ep->dest_ep_id == 
| > UCT_UD_EP_NULL_ID || ep->dest_ep_id == ctl->conn_rep.src_ep_id' failed
| > 
| > ==== backtrace (tid:  14994) ====
| >   0 0x000000000004d455 ucs_debug_print_backtrace()  ???:0
| >   1 0x0000000000042b5f uct_ud_ep_process_rx()  ???:0
| >   2 0x00000000000471cd uct_ud_mlx5_ep_t_delete()  ???:0
| >   3 0x000000000003ebdf uct_ud_iface_release_desc()  ???:0
| >   4 0x0000000000040436 ucs_cpu_get_memcpy_bw()  ???:0
| >   5 0x000000000004050b ucs_cpu_get_memcpy_bw()  ???:0
| >   6 0x0000000000041343 ucs_async_dispatch_handlers()  ???:0
| >   7 0x0000000000041488 ucs_async_dispatch_timerq()  ???:0
| >   8 0x0000000000043c34 ucs_async_pipe_drain()  ???:0
| >   9 0x0000000000007ea5 start_thread()  pthread_create.c:0
| > 10 0x00000000000fe96d __clone()  ???:0
| > =================================
| > 
| > _______________________________________________
| > Rcpp-devel mailing list
| > Rcpp-devel at lists.r-forge.r-project.org
| > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| 
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-- 
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org


More information about the Rcpp-devel mailing list