[libfabric-users] using infiniband on summit

Biddiscombe, John A. biddisco at cscs.ch
Thu Aug 29 03:17:50 PDT 2019


I compiled libfabric for the first time on summit today and I see (using verbs/rxm enabled, everything else disabled)

login3:~/build/libfabric$ ./util/fi_info
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_0-xrc
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_2-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_2-xrc
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_2
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_0-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_3-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_3-xrc
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_3
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_1-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_1-xrc
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx5_1
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs;ofi_rxm
    fabric: IB-0xfe80000000000000
    domain: mlx5_0
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM

The blurb tell me that - “The ConnectX-5 has two 100 Gbit/s Infiniband ports. Each port connects to a different top-of-rack switch. IBM says each switch card supplies a compute node with peak bandwidth of 25 GigaBytes/second. The shared PCIe 4.0 x16 port can handle a peak of 32 GB/s, so each node coupled has plenty of bandwidth to support peak InfiniBand EDR rates.”

My question is - in order to use the (rxm over verbs) infiniband interface effectively, will I need a single endpoint per node (Since I see only one ofi_rxm provider, then I’m guessing that a single endpoint will be required if I use that), or do I need to manage multiple (say 2, one per port). I’ve never used a card with multiple ports before and I’m surprised to see mxl_0/1/2/3 which appears like 4 domains - are these fundamentally different - or windows onto the same device in essence. If there is a page on the wiki anywhere that discusses this, please point me to it. I’d like to read up on how multiple ports are managed.

If there were two separate NIC cards on the node, then I’d need two endpoints yes? Will libfabric ever take care of that for me?

Flyby question : are there any plans for networking libfabric/ucx meetups at BOFs or other at SC19?

thanks

JB

--
Dr. John Biddiscombe,                    email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190829/0808efac/attachment.html>


More information about the Libfabric-users mailing list