11.7.3. Reducing wireup time
Open MPI’s run-time uses an out-of-band (OOB) communication
subsystem to pass messages during the launch, initialization, and
termination stages for the job. These messages allow mpirun
to tell
its daemons what processes to launch, and allow the daemons in turn to
forward stdio to mpirun
, update mpirun
on process status, etc.
The OOB uses TCP sockets for its communication, with each daemon
opening a socket back to mpirun
upon startup. In a large cluster,
this can mean thousands of connections being formed on the node where
mpirun
resides, and requires that mpirun
actually process all
these connection requests. mpirun
defaults to processing
connection requests sequentially — so on large clusters, a
backlog can be created that can cause remote daemons to timeout
waiting for a response.
Fortunately, Open MPI provides an alternative mechanism for processing
connection requests that helps alleviate this problem. Setting the MCA
parameter oob_tcp_listen_mode
to listen_thread
causes
mpirun
to startup a separate thread dedicated to responding to
connection requests. Thus, remote daemons receive a quick response to
their connection request, allowing mpirun
to deal with the message
as soon as possible.
Error
TODO This seems very out of date. We should have content about PMIx instant on.
This parameter can be included in the default MCA parameter file,
placed in the user’s environment, or added to the mpirun
command
line. See this FAQ entry
for more details on how to set MCA parameters.