[libfabric-users] IPoIB and verbs provider
Matthew Alexandrakis
m.alexandrakis at qmul.ac.uk
Wed Sep 11 05:22:15 PDT 2019
Hello Arun,
Both protocols you mention do appear on the `fi_info`, multiple times, courtesy of two network cards. Running ibstat, we've got `mlx4_0` for the Infiniband and `mlx4_1` is the Ethernet card.
Ran the utility 2 times for each of the 3 IP configurations, once statically and once as part of an MPI job. As the logs are pretty lengthy, I only include the static `fi_info -p verbs` output in the email body, and the rest I have linked.
`$ export FI_LOG_LEVEL=info` (included in the attached logs)
`$ fi_info -p verbs`
* IPv4
```
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_0
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_0
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_0-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
```
* IPv6 link local address
```
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_0-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_0
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
```
* IP global address
```
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_0
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_0
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_1-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_0-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
```
IPv4: https://ybin.me/p/fa6701249f3a42a3#a6YE8SSuJgkaC90bvi0AyTucEbRNlwU4CWx7kMT6fEk=
IPv6 link local: https://ybin.me/p/da52563897dff075#npBkL2Q+w9qyZoK9omYH+LHaLEy0zWrEL4nuZv9eSvw=
IPv6 global: https://ybin.me/p/15fc05690e01a3f2#znGVvfg3RGSzJAiX5sqIfSyJ/qtu9upSeyBbK14pFVM=
IPv4 MPI: https://ybin.me/p/c0c1956845ba3786#6nkO0yzY2d7N9tNksJ3EuyuYnSlCfnPVPK2qAYOmmAQ=
IPv6 local MPI: https://ybin.me/p/4bd66c185e84f2f9#9+pAMTCR2tCo8S/ezIU1xRI6BNm1zGzotamSXXBZAlE=
IPv6 global MPI: https://ybin.me/p/26d2ab2f36ac3262#YcB69yXWz1fOitWS92IyfUDkPS2KfcECJm8mXNEAmTw=
IPv4 seems to be working correctly through Infiniband, IPv6 local falls back to sockets(?) and runs through Ethernet, IPv6 global hangs.
Thanks a lot,
Regards,
Matthew
----
Matthaios Alexandrakis
Research Software Engineer, IT Services
Queen Mary University of London
Queen's Building CB204
Email: m.alexandrakis at qmul.ac.uk
________________________________
From: Ilango, Arun <arun.ilango at intel.com>
Sent: 10 September 2019 20:00
To: Matthew Alexandrakis <m.alexandrakis at qmul.ac.uk>; libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: RE: IPoIB and verbs provider
Hi Matt,
> Unfortunately, this doesn't work - it fails with an IP address not found
Can you try running the fi_info utility that ships with libfabric? “fi_info -p verbs” should list entries for the verbs provider with protocol: FI_PROTO_RDMA_CM_IB_RC and FI_PROTO_RXM. If it doesn’t show them, can you re-run with FI_LOG_LEVEL=info and share the logs?
> Furthermore, when we do assign a globally routable IP address,
> an IP based transport is used, rather than the RDMA that should
> give better performance.
Please run fi_info for this case as well.
Thanks,
Arun.
.
From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> On Behalf Of Matthew Alexandrakis
Sent: Friday, September 06, 2019 6:12 AM
To: libfabric-users at lists.openfabrics.org
Subject: [libfabric-users] IPoIB and verbs provider
Hello,
We have been configuring ofi for the 2019 Intel Compiler. More specifically, we used the verbs provider (Linux plus Infiniband), which requires an IPoIB - which was not previously required by OFA.
We have several Infiniband islands - though it's possible in the future we could join them together. The easiest way for us to configure IPoIB would be to use an IPv6 Link local address (if we then joined islands together it would just work). Unfortunately, this doesn't work - it fails with an IP address not found. Furthermore, when we do assign a globally routable IP address, an IP based transport is used, rather than the RDMA that should give better performance.
Going through the latest release notes, I noticed that IPv6 functionality was mentioned in v1.7.0, under Core: "Enhance IPv6 addressing support for AVs" and Sockets: "Add support for IPv6". Indeed, Sockets did work with IPv6 enabled. Verbs, on the other hand, defaulted running over Ethernet instead. Is IPv6 meant to work with verbs or is it Sockets-only?
Both the internal Intel libfabric 1.7.2a-impi, and 1.8.0 were used, with the same results.
Thanks,
Matt
----
Matthaios Alexandrakis
Research Software Engineer, IT Services
Queen Mary University of London
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190911/66b5f97d/attachment-0001.html>
More information about the Libfabric-users
mailing list