[libfabric-users] Verbs provider not permitting FI_EP_MSG
Philip Davis
philip.e.davis at rutgers.edu
Thu Jan 16 09:48:05 PST 2020
Hi Steve,
Here is the requested output:
[]$ ibv_devinfo -v
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.36.5120
node_guid: 248a:0703:0077:9520
sys_image_guid: 248a:0703:0077:9523
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111019
phys_port_cnt: 2
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffe00
max_qp: 131000
max_qp_wr: 16351
device_cap_flags: 0x057e9c76
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
UD_AV_PORT_ENFORCE
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
MEM_WINDOW
UD_IP_CSUM
XRC
MEM_MGT_EXTENSIONS
MEM_WINDOW_TYPE_2B
RAW_IP_CSUM
Unknown flags: 0x488000
max_sge: 32
max_sge_rd: 30
max_cq: 65408
max_cqe: 4194303
max_mr: 524032
max_pd: 32764
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 2096000
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 248
max_total_mcast_qp_attach: 2031616
max_ah: 2147483647
max_fmr: 0
max_srq: 65472
max_srq_wr: 16383
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 15
general_odp_caps:
rc_odp_caps:
NO SUPPORT
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
NO SUPPORT
completion timestamp_mask: 0x0000ffffffffffff
hca_core_clock: 427000kHZ
device_cap_flags_ex: 0x57E9C76
tso_caps:
max_tso: 0
rss_caps:
max_rwq_indirection_tables: 0
max_rwq_indirection_table_size: 0
rx_hash_function: 0x0
rx_hash_fields_mask: 0x0
max_wq_type_rq: 0
packet_pacing_caps:
qp_rate_limit_min: 0kbps
qp_rate_limit_max: 0kbps
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x0259486a
max_vl_num: 8 (4)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 128
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 14.0 Gbps (16)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:248a:0703:0077:9521
port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x02594868
max_vl_num: 8 (4)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 128
subnet_timeout: 0
init_type_reply: 0
active_width: 4X (2)
active_speed: 2.5 Gbps (1)
phys_state: POLLING (2)
GID[ 0]: fe80:0000:0000:0000:248a:0703:0077:9522
[]$ lsmod | grep ib
ib_isert 50770 0
iscsi_target_mod 302966 1 ib_isert
ib_iser 47813 0
libiscsi 57233 1 ib_iser
scsi_transport_iscsi 99909 2 ib_iser,libiscsi
ib_srpt 48170 0
target_core_mod 367918 3 iscsi_target_mod,ib_srpt,ib_isert
ib_srp 48454 0
scsi_transport_srp 20993 1 ib_srp
ib_ipoib 110142 0
ib_ucm 22589 0
ib_uverbs 64636 2 ib_ucm,rdma_ucm
ib_umad 22080 4
rdma_cm 54426 4 rpcrdma,ib_iser,rdma_ucm,ib_isert
ib_cm 47287 5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
mlx4_ib 159474 0
ib_core 211874 14 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
libcrc32c 12644 4 xfs,raid456,nf_nat,nf_conntrack
mlx4_core 309354 2 mlx4_en,mlx4_ib
libahci 31992 1 ahci
libata 238896 2 ahci,libahci
devlink 30193 3 mlx4_en,mlx4_ib,mlx4_core
On Jan 16, 2020, at 12:28 PM, Steve Welch <swelch at systemfabricworks.com<mailto:swelch at systemfabricworks.com>> wrote:
On Jan 16, 2020, at 11:03 AM, Philip Davis <philip.e.davis at rutgers.edu<mailto:philip.e.davis at rutgers.edu>> wrote:
Hi Steve,
Thanks for the quick response.
I am expecting to use the the rxm provider for verbs, but in fi_info I do not see an FI_EP_MSG-type verbs provider.
Could you provide the output for “ibv_devinfo -v” and “lsmod | grep ib”?
Steve
provider: tcp;ofi_rxm
fabric: TCP-IP
domain: tcp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
fabric: TCP-IP
domain: tcp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
fabric: TCP-IP
domain: tcp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
fabric: TCP-IP
domain: tcp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
fabric: TCP-IP
domain: tcp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: tcp;ofi_rxm
fabric: TCP-IP
domain: tcp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxd
fabric: IB-0xfe80000000000000
domain: mlx4_0-dgram
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
fabric: UDP-IP
domain: udp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
fabric: UDP-IP
domain: udp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
fabric: UDP-IP
domain: udp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
fabric: UDP-IP
domain: udp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
fabric: UDP-IP
domain: udp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: UDP;ofi_rxd
fabric: UDP-IP
domain: udp
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx4_0-dgram
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: UDP
fabric: UDP-IP
domain: udp
version: 1.1
type: FI_EP_DGRAM
protocol: FI_PROTO_UDP
provider: UDP
fabric: UDP-IP
domain: udp
version: 1.1
type: FI_EP_DGRAM
protocol: FI_PROTO_UDP
provider: UDP
fabric: UDP-IP
domain: udp
version: 1.1
type: FI_EP_DGRAM
protocol: FI_PROTO_UDP
provider: UDP
fabric: UDP-IP
domain: udp
version: 1.1
type: FI_EP_DGRAM
protocol: FI_PROTO_UDP
provider: UDP
fabric: UDP-IP
domain: udp
version: 1.1
type: FI_EP_DGRAM
protocol: FI_PROTO_UDP
provider: UDP
fabric: UDP-IP
domain: udp
version: 1.1
type: FI_EP_DGRAM
protocol: FI_PROTO_UDP
provider: sockets
fabric: 10.1.0.0/16
domain: em1
version: 2.0
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 10.1.0.0/16
domain: em1
version: 2.0
type: FI_EP_DGRAM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 10.1.0.0/16
domain: em1
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 10.157.14.0/24
domain: em2
version: 2.0
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 10.157.14.0/24
domain: em2
version: 2.0
type: FI_EP_DGRAM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 10.157.14.0/24
domain: em2
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: fe80::/64
domain: em1
version: 2.0
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: fe80::/64
domain: em1
version: 2.0
type: FI_EP_DGRAM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: fe80::/64
domain: em1
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: fe80::/64
domain: em2
version: 2.0
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: fe80::/64
domain: em2
version: 2.0
type: FI_EP_DGRAM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: fe80::/64
domain: em2
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 127.0.0.0/8
domain: lo
version: 2.0
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 127.0.0.0/8
domain: lo
version: 2.0
type: FI_EP_DGRAM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: 127.0.0.0/8
domain: lo
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: ::1/128
domain: lo
version: 2.0
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: ::1/128
domain: lo
version: 2.0
type: FI_EP_DGRAM
protocol: FI_PROTO_SOCK_TCP
provider: sockets
fabric: ::1/128
domain: lo
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_SOCK_TCP
provider: tcp
fabric: TCP-IP
domain: tcp
version: 0.1
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: tcp
fabric: TCP-IP
domain: tcp
version: 0.1
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: tcp
fabric: TCP-IP
domain: tcp
version: 0.1
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: tcp
fabric: TCP-IP
domain: tcp
version: 0.1
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: tcp
fabric: TCP-IP
domain: tcp
version: 0.1
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: tcp
fabric: TCP-IP
domain: tcp
version: 0.1
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
provider: shm
fabric: shm
domain: shm
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_SHM
Thanks,
Philip
On Jan 16, 2020, at 11:03 AM, Steve Welch <swelch at systemfabricworks.com<mailto:swelch at systemfabricworks.com>> wrote:
Hi Phillip,
Since you are specifying an FI_EP_RDM in your hints I assume you want to utilize the RXM provider on top of the Verbs core provider (i.e. ofi_rxm;verbs). The Verbs provider does not offer native FI_RDM_EP support. To use either XRC (or FI_EP_RDM endpoint) you will have to use RXM, but I am unaware of any IB provider that supported XRC that did not support RC.
If you issue a 'fi_info -p verbs -v’ it will list all the verbs domains supported and the underlying protocol and you could verify if RC should be supported (via RXM for FI_EP_RDM). If you issue 'fi_info -p “ofi_rxm;verbs”', you should see multiple domains for the “ofi_rxm;verbs” provider combination. XRC domains have the “-xrc” suffix.
If you must use XRC and the RXM/Verbs combination then you will need to set the environment variable FI_OFI_RXM_USE_SRX=1 and RXM will handle the shared RX details.
Steve
On Jan 16, 2020, at 8:56 AM, Philip Davis <philip.e.davis at rutgers.edu<mailto:philip.e.davis at rutgers.edu>> wrote:
Hello,
I am working with a user that is running on an older Infiniband cluster. Using libfaric with the following hints:
hints->caps = FI_MSG | FI_SEND | FI_RECV | FI_REMOTE_READ |
FI_REMOTE_WRITE | FI_RMA | FI_READ | FI_WRITE;
hints->mode = FI_CONTEXT | FI_LOCAL_MR | FI_CONTEXT2 | FI_MSG_PREFIX |
FI_ASYNC_IOV | FI_RX_CQ_DATA;
hints->domain_attr->mr_mode = FI_MR_BASIC;
hints->domain_attr->control_progress = FI_PROGRESS_AUTO;
hints->domain_attr->data_progress = FI_PROGRESS_AUTO;
hints->ep_attr->type = FI_EP_RDM;
No verbs providers are found. Looking through the debug output, I suspect this is the crucial line:
libfabric:verbs:fabric:fi_ibv_get_matching_info():1213<info> hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
I take it that the underlying hardware is only compatible with FI_PROTO_RDMA_CM_IB_XRC protocol for MSG endpoints, and it looks like I need to have FI_SHARED_CONTEXT enabled for these endpoints to be supported. I’m having some trouble understanding the implications of using FI_SHARED_CONTEXT. If I only ever use one endpoint, is there any functional or performance impact to setting this? I’d rather not change to using shared contexts unconditionally, so is there a good way for me to detect this situation other than to do a maximally permissive fi_getinfo and iterate through the verbs results?
Thanks,
Philip
_______________________________________________
Libfabric-users mailing list
Libfabric-users at lists.openfabrics.org<mailto:Libfabric-users at lists.openfabrics.org>
https://lists.openfabrics.org/mailman/listinfo/libfabric-users<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.openfabrics.org%2Fmailman%2Flistinfo%2Flibfabric-users&data=02%7C01%7Cphilip.e.davis%40rutgers.edu%7Cc0e0cc2c82fe4113b22d08d79aa97baa%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637147925028517698&sdata=xS%2B0nVcPIbof8BC1cODtdUPjVFwtmxGMk7D1ot2bd4s%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200116/dca2ce9b/attachment-0001.htm>
More information about the Libfabric-users
mailing list