[libfabric-users] using infiniband on summit
Biddiscombe, John A.
biddisco at cscs.ch
Thu Aug 29 03:17:50 PDT 2019
I compiled libfabric for the first time on summit today and I see (using verbs/rxm enabled, everything else disabled)
login3:~/build/libfabric$ ./util/fi_info
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0-xrc
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_2-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_2-xrc
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_2
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_3-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_3-xrc
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_3
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_1-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_1-xrc
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_1
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
The blurb tell me that - “The ConnectX-5 has two 100 Gbit/s Infiniband ports. Each port connects to a different top-of-rack switch. IBM says each switch card supplies a compute node with peak bandwidth of 25 GigaBytes/second. The shared PCIe 4.0 x16 port can handle a peak of 32 GB/s, so each node coupled has plenty of bandwidth to support peak InfiniBand EDR rates.”
My question is - in order to use the (rxm over verbs) infiniband interface effectively, will I need a single endpoint per node (Since I see only one ofi_rxm provider, then I’m guessing that a single endpoint will be required if I use that), or do I need to manage multiple (say 2, one per port). I’ve never used a card with multiple ports before and I’m surprised to see mxl_0/1/2/3 which appears like 4 domains - are these fundamentally different - or windows onto the same device in essence. If there is a page on the wiki anywhere that discusses this, please point me to it. I’d like to read up on how multiple ports are managed.
If there were two separate NIC cards on the node, then I’d need two endpoints yes? Will libfabric ever take care of that for me?
Flyby question : are there any plans for networking libfabric/ucx meetups at BOFs or other at SC19?
thanks
JB
--
Dr. John Biddiscombe, email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland | Fax: +41 (91) 610.82.82
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190829/0808efac/attachment.html>
More information about the Libfabric-users
mailing list