[libfabric-users] libfabric/intel mpi with mlx5 and > 300 cores/ranks

Walter, Eric J ejwalt at wm.edu
Fri Sep 13 10:27:46 PDT 2019


Hi Arun,

Thanks for asking.

I am using Intel parallel studio 19.0.4.243.  Using the "bt-mz.E.x" NASA benchmark (https://www.nas.nasa.gov/publications/npb.html), the job fails on 700 cores, but runs fine on 200 cores.  The output is quite large, so I have gzipped it before attaching.   I have seen repeatedly that when I get beyond a certain # of cores, the job fails.  This happens for many other codes as well.     This also does not happen with the Intel 2018 compilers.

BTW, I didn't let the 200 core run go to completion to reduce the output.   It does run to completion with no problem.

Let me know if you have any other questions.

Regards,

Eric





--
Eric J. Walter

College of William and Mary
IT/High Performance Computing Group
ISC 1271
P.O. Box 8795
Williamsburg, VA  23187-8795
email:    ejwalt at wm.edu
phone:  (757) 221-1886
fax:        (757) 221-1321


________________________________
From: Ilango, Arun <arun.ilango at intel.com>
Sent: Tuesday, September 10, 2019 2:43 PM
To: Walter, Eric J <ejwalt at wm.edu>; libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: RE: libfabric/intel mpi with mlx5 and > 300 cores/ranks


Hi Eric,



Which version of Intel MPI 2019 are you using? Can you try running with FI_LOG_LEVEL=warn and share the logs?



Thanks,

Arun.



From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> On Behalf Of Walter, Eric J
Sent: Tuesday, August 06, 2019 10:48 AM
To: libfabric-users at lists.openfabrics.org
Subject: [libfabric-users] libfabric/intel mpi with mlx5 and > 300 cores/ranks





Hi, we are currently standing up a new cluster with Mellanox ConnectX-5 adapters. I have found that using openMPI, mvapich2, and intel2018-mpi, we can run MPI jobs on all 960 cores in the cluster, however, using intel2019-mpi we can't get beyond ~300 mpi ranks. If we do, we get the following error for every rank:



Abort(273768207) on node 650 (rank 650 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:

PMPI_Comm_split(507)...................: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=650, new_comm=0x7911e8) failed

PMPI_Comm_split(489)...................:

MPIR_Comm_split_impl(167)..............:

MPIR_Allgather_intra_auto(145).........: Failure during collective

MPIR_Allgather_intra_auto(141).........:

MPIR_Allgather_intra_brucks(115).......:

MPIC_Sendrecv(344).....................:

MPID_Isend(662)........................:

MPID_isend_unsafe(282).................:

MPIDI_OFI_send_lightweight_request(106):

(unknown)(): Other MPI error

----------------------------------------------------------------------------------------------------------

This is using the default FI_PROVIDER of ofi_rxm. If we switch to using "verbs", we can run all 960 cores, but tests show an order of magnitude increase in latency and much longer run times.



We have tried installing our own libfabrics (from the git repo ; also we verified with verbose debugging that we are using this libfabrics) and this behavoir does not change



Is there anything I can change to allow all 960 cores using the default ofi_rxm provider?  Or, is there a way to improve performance using the verbs provider?



For completeness:

Using MLNX_OFED_LINUX-4.6-1.0.1.1-rhel7.6-x86_64 ofed

CentOS 7.6.1810 (kernel = 3.10.0-957.21.3.el7.x86_64)

Intel Parallel studio version 19.0.4.243

Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]





Thanks!



Eric

--

Eric J. Walter



College of William and Mary

IT/High Performance Computing Group

ISC 1271

P.O. Box 8795

Williamsburg, VA  23187-8795

email:    ejwalt at wm.edu<mailto:ejwalt at wm.edu>

phone:  (757) 221-1886

fax:        (757) 221-1321


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190913/0bc9300a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bt-mz.E.x_700.out.gz
Type: application/gzip
Size: 9635 bytes
Desc: bt-mz.E.x_700.out.gz
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190913/0bc9300a/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bt-mz.E.x_200_out.gz
Type: application/gzip
Size: 12694 bytes
Desc: bt-mz.E.x_200_out.gz
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190913/0bc9300a/attachment-0003.gz>


More information about the Libfabric-users mailing list