Setting Up OSU Benchmarks and MVAPICH on Ubuntu 22.04

I already had intel oneapi MPI tools installed. So, this post depends on this.
This one uses the exact same topology, e.g. 3 machines on the same subnet.
The installation is really easy:
1. Clone the repository
2. Configure
3. Compile
4. Install
5. Copy installed files to all nodes
6. Prepare hostfile and machine file
7. Run


# Set these!
host1=192.168.124.95
host2=192.168.124.166

source /opt/intel/oneapi/setvars.sh
wget https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-7.4.tar.gz
tar -xvf osu-micro-benchmarks-7.4.tar.gz
cd osu-micro-benchmarks-7.4/
rm -r build
mkdir build
./configure CC=$(which mpicc) CXX=$(which mpicxx) --prefix=$(pwd)/build
make -j16
scp -r ~/osu-micro-benchmarks-7.4 $host1: # you can also copy only the binaries i guess
scp -r ~/osu-micro-benchmarks-7.4 $host2:

echo "
localhost
$host1
$host2
" > hostfile
echo "
localhost:2
$host1:2
$host2:2
" > machinefile
# This will generate some results using the other nodes on the network
I_MPI_HYDRA_IFACE=enp1s0 ICX_TLS=tcp mpiexec -hostfile hostfile -machinefile machinefile -n 20 $(pwd)/build/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce

The result would be something like this:

#I_MPI_HYDRA_IFACE=enp1s0 ICX_TLS=tcp mpiexec -hostfile hostfile -machinefile machinefile -n 20 $(pwd)/build/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce

# OSU MPI Allreduce Latency Test v7.4
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                      92.90
2                      91.50
4                     110.77
8                      87.68
16                     61.82
32                     62.62
64                     64.87
128                    65.12
256                   116.08
512                   105.75
1024                  108.51
2048                  124.48
4096                  139.79
8192                  141.99
16384                 236.34
32768                 182.24
65536                 288.34
131072                420.92
262144               1130.04
524288               2264.17
1048576              4758.53

Troubleshooting

Error 1

[mpiexec@mpi-test-1] Error: Unable to run bstrap_proxy on 192.168.124.95 (pid 329418, exit code 768)
[mpiexec@mpi-test-1] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:157): check exit codes error
[mpiexec@mpi-test-1] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:206): poll for event error
[mpiexec@mpi-test-1] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1069): error waiting for event
[mpiexec@mpi-test-1] Error setting up the bootstrap proxies
[mpiexec@mpi-test-1] Possible reasons:
[mpiexec@mpi-test-1] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@mpi-test-1] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts.
[mpiexec@mpi-test-1]    Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@mpi-test-1] 3. Firewall refused connection.
[mpiexec@mpi-test-1]    Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@mpi-test-1] 4. Ssh bootstrap cannot launch processes on remote host.
[mpiexec@mpi-test-1]    Make sure that passwordless ssh connection is established across compute hosts.
[mpiexec@mpi-test-1]    You may try using -bootstrap option to select alternative launcher.
[bstrap:0:0@mpi-test-3] HYD_sock_connect (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:209): getaddrinfo returned error -3 (Temporary failure in name resolution)
[bstrap:0:0@mpi-test-3] main (../../../../../src/pm/i_hydra/libhydra/bstrap/src/hydra_bstrap_proxy.c:538): unable to connect to server mpi1 at port 46083 (check for firewalls!)
[bstrap:0:1@mpi-test-2] HYD_sock_connect (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:209): getaddrinfo returned error -3 (Temporary failure in name resolution)
[bstrap:0:1@mpi-test-2] main (../../../../../src/pm/i_hydra/libhydra/bstrap/src/hydra_bstrap_proxy.c:538): unable to connect to server mpi1 at port 46083 (check for firewalls!)

This error means you did not provide the communication interface to the mpiexec command:


I_MPI_HYDRA_IFACE=enp1s0 ICX_TLS=tcp

Installation of MVAPICH (still not working)


wget https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich-3.0.tar.gz
tar -xzf mvapich-3.0.tar.gz mvapich-3.0/
cd mvapich-3.0/
source /opt/intel/oneapi/setvars.sh
./configure CC=mpicc CXX=mpicxx

However, I got this error:

configure: error: no ch4 netmod selected

  The default ch4 device could not detect a preferred network
  library. Supported options are ofi (libfabric) and ucx:

    --with-device=ch4:ofi or --with-device=ch4:ucx

So, I installed the libfabric stuff to enable ch4:ofi:


sudo apt install libfabric-dev
make

Then another error: ./confdb/ylwrap: line 176: yacc: command not found


sudo apt install yacc -y
make -j2

I got error with the MPI library.

  CCLD     src/env/mpichversion
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__gtq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__gtq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__geq'
/usr/bin/ld: lib/.libs/libmpi.so/usr/bin/ld: undefined reference to `: __geqlib/.libs/libmpi.so'
: undefined reference to `__fmaxq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__fmaxq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__eqq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__eqq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__addq'
/usr/bin/ld/usr/bin/ld: : lib/.libs/libmpi.solib/.libs/libmpi.so: undefined reference to `: undefined reference to `__ltq__addq'
'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__ltq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__fminq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__fminq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__neq'
/usr/bin/ld: lib/.libs/libmpi.so: undefined reference to `__neq'

I was using Intel MPI, so I tried OpenMPI if that works. However, it also did not work. I installed openMPI library and started configuring/compiling again, it gave this error configure: error: The Fortran compiler gfortran does not accept programs that call the same routine with arguments of different types without the option -fallow-argument-mismatch. Rerun configure with FFLAGS=-fallow-argument-mismatch` and I did what it says, and the configure worked. This is the series of commands I tried (and did not work at the end):


sudo apt install libopenmpi-dev
./configure CC=mpicc CXX=mpicxx FFLAGS=-fallow-argument-mismatch
make -j16

TODO -> docker seems working, but not sure https://hub.docker.com/r/nbclosu/mvapich

References

[1] MVAPICH Quick Start


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *