I already had intel oneapi MPI tools installed. So, this post depends on this.
This one uses the exact same topology, e.g. 3 machines on the same subnet.
The installation is really easy:
1. Clone the repository
2. Configure
3. Compile
4. Install
5. Copy installed files to all nodes
6. Prepare hostfile and machine file
7. Run
# Set these!
host1=192.168.124.95
host2=192.168.124.166
source /opt/intel/oneapi/setvars.sh
wget https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-7.4.tar.gz
tar -xvf osu-micro-benchmarks-7.4.tar.gz
cd osu-micro-benchmarks-7.4/
rm -r build
mkdir build
./configure CC=$(which mpicc) CXX=$(which mpicxx) --prefix=$(pwd)/build
make -j16
scp -r ~/osu-micro-benchmarks-7.4 $host1: # you can also copy only the binaries i guess
scp -r ~/osu-micro-benchmarks-7.4 $host2:
echo "
localhost
$host1
$host2
" > hostfile
echo "
localhost:2
$host1:2
$host2:2
" > machinefile
# This will generate some results using the other nodes on the network
I_MPI_HYDRA_IFACE=enp1s0 ICX_TLS=tcp mpiexec -hostfile hostfile -machinefile machinefile -n 20 $(pwd)/build/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce
The result would be something like this:
#I_MPI_HYDRA_IFACE=enp1s0 ICX_TLS=tcp mpiexec -hostfile hostfile -machinefile machinefile -n 20 $(pwd)/build/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce # OSU MPI Allreduce Latency Test v7.4 # Datatype: MPI_CHAR. # Size Avg Latency(us) 1 92.90 2 91.50 4 110.77 8 87.68 16 61.82 32 62.62 64 64.87 128 65.12 256 116.08 512 105.75 1024 108.51 2048 124.48 4096 139.79 8192 141.99 16384 236.34 32768 182.24 65536 288.34 131072 420.92 262144 1130.04 524288 2264.17 1048576 4758.53
Troubleshooting
Error 1
[mpiexec@mpi-test-1] Error: Unable to run bstrap_proxy on 192.168.124.95 (pid 329418, exit code 768) [mpiexec@mpi-test-1] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:157): check exit codes error [mpiexec@mpi-test-1] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:206): poll for event error [mpiexec@mpi-test-1] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1069): error waiting for event [mpiexec@mpi-test-1] Error setting up the bootstrap proxies [mpiexec@mpi-test-1] Possible reasons: [mpiexec@mpi-test-1] 1. Host is unavailable. Please check that all hosts are available. [mpiexec@mpi-test-1] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. [mpiexec@mpi-test-1] Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions. [mpiexec@mpi-test-1] 3. Firewall refused connection. [mpiexec@mpi-test-1] Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable. [mpiexec@mpi-test-1] 4. Ssh bootstrap cannot launch processes on remote host. [mpiexec@mpi-test-1] Make sure that passwordless ssh connection is established across compute hosts. [mpiexec@mpi-test-1] You may try using -bootstrap option to select alternative launcher. [bstrap:0:0@mpi-test-3] HYD_sock_connect (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:209): getaddrinfo returned error -3 (Temporary failure in name resolution) [bstrap:0:0@mpi-test-3] main (../../../../../src/pm/i_hydra/libhydra/bstrap/src/hydra_bstrap_proxy.c:538): unable to connect to server mpi1 at port 46083 (check for firewalls!) [bstrap:0:1@mpi-test-2] HYD_sock_connect (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:209): getaddrinfo returned error -3 (Temporary failure in name resolution) [bstrap:0:1@mpi-test-2] main (../../../../../src/pm/i_hydra/libhydra/bstrap/src/hydra_bstrap_proxy.c:538): unable to connect to server mpi1 at port 46083 (check for firewalls!)
This error means you did not provide the communication interface to the mpiexec command:
I_MPI_HYDRA_IFACE=enp1s0 ICX_TLS=tcp
Installation of MVAPICH (still not working)
wget https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich-3.0.tar.gz
tar -xzf mvapich-3.0.tar.gz mvapich-3.0/
cd mvapich-3.0/
source /opt/intel/oneapi/setvars.sh
./configure CC=mpicc CXX=mpicxx
However, I got this error:
configure: error: no ch4 netmod selected The default ch4 device could not detect a preferred network library. Supported options are ofi (libfabric) and ucx: --with-device=ch4:ofi or --with-device=ch4:ucx
So, I installed the libfabric stuff to enable ch4:ofi:
sudo apt install libfabric-dev
make
Then another error: ./confdb/ylwrap: line 176: yacc: command not found
sudo apt install yacc -y
make -j2
I got error with the MPI library.
CCLD src/env/mpichversion /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__gtq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__gtq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__geq' /usr/bin/ld: lib/.libs/libmpi.so/usr/bin/ld: undefined reference to `: __geqlib/.libs/libmpi.so' : undefined reference to `__fmaxq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__fmaxq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__eqq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__eqq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__addq' /usr/bin/ld/usr/bin/ld: : lib/.libs/libmpi.solib/.libs/libmpi.so: undefined reference to `: undefined reference to `__ltq__addq' ' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__ltq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__fminq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__fminq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__neq' /usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__neq'
I was using Intel MPI, so I tried OpenMPI if that works. However, it also did not work. I installed openMPI library and started configuring/compiling again, it gave this error configure: error: The Fortran compiler gfortran does not accept programs that call the same routine with arguments of different types without the option -fallow-argument-mismatch. Rerun configure with FFLAGS=-fallow-argument-mismatch` and I did what it says, and the configure worked. This is the series of commands I tried (and did not work at the end):
sudo apt install libopenmpi-dev
./configure CC=mpicc CXX=mpicxx FFLAGS=-fallow-argument-mismatch
make -j16
TODO -> docker seems working, but not sure https://hub.docker.com/r/nbclosu/mvapich
Leave a Reply