[libfabric-users] IPoIB and verbs provider

Matthew Alexandrakis m.alexandrakis at qmul.ac.uk
Wed Sep 11 05:22:15 PDT 2019


Hello Arun,

Both protocols you mention do appear on the `fi_info`, multiple times, courtesy of two network cards. Running ibstat, we've got `mlx4_0` for the Infiniband and `mlx4_1` is the Ethernet card.

Ran the utility 2 times for each of the 3 IP configurations, once statically and once as part of an MPI job. As the logs are pretty lengthy, I only include the static `fi_info -p verbs` output in the email body, and the rest I have linked.

`$ export FI_LOG_LEVEL=info` (included in the attached logs)
`$ fi_info -p verbs`


  *   IPv4

```
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_0
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_0-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
```


  *   IPv6 link local address

```
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_0-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
```

  *   IP global address

```
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_0
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_1-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_0-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
```

IPv4: https://ybin.me/p/fa6701249f3a42a3#a6YE8SSuJgkaC90bvi0AyTucEbRNlwU4CWx7kMT6fEk=
IPv6 link local: https://ybin.me/p/da52563897dff075#npBkL2Q+w9qyZoK9omYH+LHaLEy0zWrEL4nuZv9eSvw=
IPv6 global: https://ybin.me/p/15fc05690e01a3f2#znGVvfg3RGSzJAiX5sqIfSyJ/qtu9upSeyBbK14pFVM=

IPv4 MPI: https://ybin.me/p/c0c1956845ba3786#6nkO0yzY2d7N9tNksJ3EuyuYnSlCfnPVPK2qAYOmmAQ=
IPv6 local MPI: https://ybin.me/p/4bd66c185e84f2f9#9+pAMTCR2tCo8S/ezIU1xRI6BNm1zGzotamSXXBZAlE=
IPv6 global MPI: https://ybin.me/p/26d2ab2f36ac3262#YcB69yXWz1fOitWS92IyfUDkPS2KfcECJm8mXNEAmTw=

IPv4 seems to be working correctly through Infiniband, IPv6 local falls back to sockets(?) and runs through Ethernet, IPv6 global hangs.

Thanks a lot,
Regards,
Matthew


----

Matthaios Alexandrakis

Research Software Engineer, IT Services

Queen Mary University of London

Queen's Building CB204

Email: m.alexandrakis at qmul.ac.uk

________________________________
From: Ilango, Arun <arun.ilango at intel.com>
Sent: 10 September 2019 20:00
To: Matthew Alexandrakis <m.alexandrakis at qmul.ac.uk>; libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: RE: IPoIB and verbs provider


Hi Matt,



> Unfortunately, this doesn't work - it fails with an IP address not found



Can you try running the fi_info utility that ships with libfabric? “fi_info -p verbs” should list entries for the verbs provider with protocol: FI_PROTO_RDMA_CM_IB_RC and FI_PROTO_RXM. If it doesn’t show them, can you re-run with FI_LOG_LEVEL=info and share the logs?



> Furthermore, when we do assign a globally routable IP address,

> an IP based transport is used, rather than the RDMA that should

> give better performance.



Please run fi_info for this case as well.



Thanks,

Arun.

.

From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> On Behalf Of Matthew Alexandrakis
Sent: Friday, September 06, 2019 6:12 AM
To: libfabric-users at lists.openfabrics.org
Subject: [libfabric-users] IPoIB and verbs provider



Hello,



We have been configuring ofi for the 2019 Intel Compiler. More specifically, we used the verbs provider (Linux plus Infiniband), which requires an IPoIB - which was not previously required by OFA.



We have several Infiniband islands - though it's possible in the future we could join them together. The easiest way for us to configure IPoIB would be to use an IPv6 Link local address (if we then joined islands together it would just work). Unfortunately, this doesn't work - it fails with an IP address not found. Furthermore, when we do assign a globally routable IP address, an IP based transport is used, rather than the RDMA that should give better performance.



Going through the latest release notes, I noticed that IPv6 functionality was mentioned in v1.7.0, under Core: "Enhance IPv6 addressing support for AVs" and Sockets: "Add support for IPv6". Indeed, Sockets did work with IPv6 enabled. Verbs, on the other hand, defaulted running over Ethernet instead. Is IPv6 meant to work with verbs or is it Sockets-only?



Both the internal Intel libfabric 1.7.2a-impi, and 1.8.0 were used, with the same results.



Thanks,

Matt



----

Matthaios Alexandrakis

Research Software Engineer, IT Services

Queen Mary University of London
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190911/66b5f97d/attachment-0001.html>


More information about the Libfabric-users mailing list