[libfabric-users] Verbs provider not permitting FI_EP_MSG

Philip Davis philip.e.davis at rutgers.edu
Thu Jan 16 09:48:05 PST 2020


Hi Steve,

Here is the requested output:

[]$ ibv_devinfo -v
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.36.5120
        node_guid:                      248a:0703:0077:9520
        sys_image_guid:                 248a:0703:0077:9523
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111019
        phys_port_cnt:                  2
        max_mr_size:                    0xffffffffffffffff
        page_size_cap:                  0xfffffe00
        max_qp:                         131000
        max_qp_wr:                      16351
        device_cap_flags:               0x057e9c76
                                        BAD_PKEY_CNTR
                                        BAD_QKEY_CNTR
                                        AUTO_PATH_MIG
                                        CHANGE_PHY_PORT
                                        UD_AV_PORT_ENFORCE
                                        PORT_ACTIVE_EVENT
                                        SYS_IMAGE_GUID
                                        RC_RNR_NAK_GEN
                                        MEM_WINDOW
                                        UD_IP_CSUM
                                        XRC
                                        MEM_MGT_EXTENSIONS
                                        MEM_WINDOW_TYPE_2B
                                        RAW_IP_CSUM
                                        Unknown flags: 0x488000
        max_sge:                        32
        max_sge_rd:                     30
        max_cq:                         65408
        max_cqe:                        4194303
        max_mr:                         524032
        max_pd:                         32764
        max_qp_rd_atom:                 16
        max_ee_rd_atom:                 0
        max_res_rd_atom:                2096000
        max_qp_init_rd_atom:            128
        max_ee_init_rd_atom:            0
        atomic_cap:                     ATOMIC_HCA (1)
        max_ee:                         0
        max_rdd:                        0
        max_mw:                         0
        max_raw_ipv6_qp:                0
        max_raw_ethy_qp:                0
        max_mcast_grp:                  8192
        max_mcast_qp_attach:            248
        max_total_mcast_qp_attach:      2031616
        max_ah:                         2147483647
        max_fmr:                        0
        max_srq:                        65472
        max_srq_wr:                     16383
        max_srq_sge:                    31
        max_pkeys:                      128
        local_ca_ack_delay:             15
        general_odp_caps:
        rc_odp_caps:
                                        NO SUPPORT
        uc_odp_caps:
                                        NO SUPPORT
        ud_odp_caps:
                                        NO SUPPORT
        completion timestamp_mask:                      0x0000ffffffffffff
        hca_core_clock:                 427000kHZ
        device_cap_flags_ex:            0x57E9C76
        tso_caps:
        max_tso:                        0
        rss_caps:
                max_rwq_indirection_tables:                     0
                max_rwq_indirection_table_size:                 0
                rx_hash_function:                               0x0
                rx_hash_fields_mask:                            0x0
        max_wq_type_rq:                 0
        packet_pacing_caps:
                qp_rate_limit_min:      0kbps
                qp_rate_limit_max:      0kbps
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x0259486a
                        max_vl_num:             8 (4)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           128
                        gid_tbl_len:            128
                        subnet_timeout:         18
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           14.0 Gbps (16)
                        phys_state:             LINK_UP (5)
                        GID[  0]:               fe80:0000:0000:0000:248a:0703:0077:9521

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             InfiniBand
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x02594868
                        max_vl_num:             8 (4)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           128
                        gid_tbl_len:            128
                        subnet_timeout:         0
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           2.5 Gbps (1)
                        phys_state:             POLLING (2)
                        GID[  0]:               fe80:0000:0000:0000:248a:0703:0077:9522


[]$ lsmod | grep ib
ib_isert               50770  0
iscsi_target_mod      302966  1 ib_isert
ib_iser                47813  0
libiscsi               57233  1 ib_iser
scsi_transport_iscsi    99909  2 ib_iser,libiscsi
ib_srpt                48170  0
target_core_mod       367918  3 iscsi_target_mod,ib_srpt,ib_isert
ib_srp                 48454  0
scsi_transport_srp     20993  1 ib_srp
ib_ipoib              110142  0
ib_ucm                 22589  0
ib_uverbs              64636  2 ib_ucm,rdma_ucm
ib_umad                22080  4
rdma_cm                54426  4 rpcrdma,ib_iser,rdma_ucm,ib_isert
ib_cm                  47287  5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
mlx4_ib               159474  0
ib_core               211874  14 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
libcrc32c              12644  4 xfs,raid456,nf_nat,nf_conntrack
mlx4_core             309354  2 mlx4_en,mlx4_ib
libahci                31992  1 ahci
libata                238896  2 ahci,libahci
devlink                30193  3 mlx4_en,mlx4_ib,mlx4_core

On Jan 16, 2020, at 12:28 PM, Steve Welch <swelch at systemfabricworks.com<mailto:swelch at systemfabricworks.com>> wrote:



On Jan 16, 2020, at 11:03 AM, Philip Davis <philip.e.davis at rutgers.edu<mailto:philip.e.davis at rutgers.edu>> wrote:

Hi Steve,

Thanks for the quick response.

I am expecting to use the the rxm provider for verbs, but in fi_info I do not see an FI_EP_MSG-type verbs provider.

Could you provide the output for “ibv_devinfo -v” and “lsmod | grep ib”?

Steve


provider: tcp;ofi_rxm
    fabric: TCP-IP
    domain: tcp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
    fabric: TCP-IP
    domain: tcp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
    fabric: TCP-IP
    domain: tcp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
    fabric: TCP-IP
    domain: tcp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
    fabric: TCP-IP
    domain: tcp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
    fabric: TCP-IP
    domain: tcp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM
provider: verbs;ofi_rxd
    fabric: IB-0xfe80000000000000
    domain: mlx4_0-dgram
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
provider: verbs
    fabric: IB-0xfe80000000000000
    domain: mlx4_0-dgram
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_IB_UD
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.1
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.1
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.1
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.1
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.1
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.1
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: sockets
    fabric: 10.1.0.0/16
    domain: em1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.1.0.0/16
    domain: em1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.1.0.0/16
    domain: em1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.157.14.0/24
    domain: em2
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.157.14.0/24
    domain: em2
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.157.14.0/24
    domain: em2
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: fe80::/64
    domain: em1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: fe80::/64
    domain: em1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: fe80::/64
    domain: em1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: fe80::/64
    domain: em2
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: fe80::/64
    domain: em2
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: fe80::/64
    domain: em2
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: ::1/128
    domain: lo
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: ::1/128
    domain: lo
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: ::1/128
    domain: lo
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: tcp
    fabric: TCP-IP
    domain: tcp
    version: 0.1
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: tcp
    fabric: TCP-IP
    domain: tcp
    version: 0.1
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: tcp
    fabric: TCP-IP
    domain: tcp
    version: 0.1
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: tcp
    fabric: TCP-IP
    domain: tcp
    version: 0.1
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: tcp
    fabric: TCP-IP
    domain: tcp
    version: 0.1
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: tcp
    fabric: TCP-IP
    domain: tcp
    version: 0.1
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: shm
    fabric: shm
    domain: shm
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SHM

Thanks,
Philip

On Jan 16, 2020, at 11:03 AM, Steve Welch <swelch at systemfabricworks.com<mailto:swelch at systemfabricworks.com>> wrote:

Hi Phillip,

Since you are specifying an FI_EP_RDM in your hints I assume you want to utilize the RXM provider on top of the Verbs core provider (i.e. ofi_rxm;verbs). The Verbs provider does not offer native FI_RDM_EP support. To use either XRC (or FI_EP_RDM endpoint)  you will have to use RXM, but I am unaware of any IB provider that supported XRC that did not support RC.

If you issue a 'fi_info -p verbs -v’ it will list all the verbs domains supported and the underlying protocol and you could verify if RC should be supported (via RXM for FI_EP_RDM). If you issue 'fi_info -p “ofi_rxm;verbs”', you should see multiple domains for the “ofi_rxm;verbs” provider combination. XRC domains have the “-xrc” suffix.

If you must use XRC and the RXM/Verbs combination then you will need to set the environment variable FI_OFI_RXM_USE_SRX=1 and RXM will handle the shared RX details.

Steve


On Jan 16, 2020, at 8:56 AM, Philip Davis <philip.e.davis at rutgers.edu<mailto:philip.e.davis at rutgers.edu>> wrote:

Hello,

I am working with a user that is running on an older Infiniband cluster. Using libfaric with the following hints:

hints->caps = FI_MSG | FI_SEND | FI_RECV | FI_REMOTE_READ |
                 FI_REMOTE_WRITE | FI_RMA | FI_READ | FI_WRITE;
   hints->mode = FI_CONTEXT | FI_LOCAL_MR | FI_CONTEXT2 | FI_MSG_PREFIX |
                 FI_ASYNC_IOV | FI_RX_CQ_DATA;
   hints->domain_attr->mr_mode = FI_MR_BASIC;
   hints->domain_attr->control_progress = FI_PROGRESS_AUTO;
   hints->domain_attr->data_progress = FI_PROGRESS_AUTO;
   hints->ep_attr->type = FI_EP_RDM;


No verbs providers are found. Looking through the debug output, I suspect this is the crucial line:

libfabric:verbs:fabric:fi_ibv_get_matching_info():1213<info> hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints

I take it that the underlying hardware is only compatible with FI_PROTO_RDMA_CM_IB_XRC protocol for MSG endpoints, and it looks like I need to have FI_SHARED_CONTEXT enabled for these endpoints to be supported. I’m having some trouble understanding the implications of using FI_SHARED_CONTEXT. If I only ever use one endpoint, is there any functional or performance impact to setting this? I’d rather not change to using shared contexts unconditionally, so is there a good way for me to detect this situation other than to do a maximally permissive fi_getinfo and iterate through the verbs results?

Thanks,
Philip
_______________________________________________
Libfabric-users mailing list
Libfabric-users at lists.openfabrics.org<mailto:Libfabric-users at lists.openfabrics.org>
https://lists.openfabrics.org/mailman/listinfo/libfabric-users<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.openfabrics.org%2Fmailman%2Flistinfo%2Flibfabric-users&data=02%7C01%7Cphilip.e.davis%40rutgers.edu%7Cc0e0cc2c82fe4113b22d08d79aa97baa%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637147925028517698&sdata=xS%2B0nVcPIbof8BC1cODtdUPjVFwtmxGMk7D1ot2bd4s%3D&reserved=0>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200116/dca2ce9b/attachment-0001.htm>


More information about the Libfabric-users mailing list