[Users] LNet and MLX5
Amir Shehata
amir.shehata.whamcloud at gmail.com
Tue Sep 8 16:09:17 PDT 2015
Hello all,
Running Lustre with MLX5
We were trying to increase O2IBLND's peer_credits to 32 on MLX5. Here is
the problematic code:
init_qp_attr->event_handler = kiblnd_qp_event;
init_qp_attr->qp_context = conn;
init_qp_attr->cap.max_send_wr = IBLND_SEND_WRS(version);
init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(version);
init_qp_attr->cap.max_send_sge = 1;
init_qp_attr->cap.max_recv_sge = 1;
init_qp_attr->sq_sig_type = IB_SIGNAL_REQ_WR;
init_qp_attr->qp_type = IB_QPT_RC;
init_qp_attr->send_cq = cq;
init_qp_attr->recv_cq = cq;
rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr);
#define IBLND_SEND_WRS(v) ((IBLND_RDMA_FRAGS(v) + 1) *
IBLND_CONCURRENT_SENDS(v))
#define IBLND_RDMA_FRAGS(v) ((v) == IBLND_MSG_VERSION_1 ? \
IBLND_MAX_RDMA_FRAGS :
IBLND_CFG_RDMA_FRAGS)
#define IBLND_CFG_RDMA_FRAGS (*kiblnd_tunables.kib_map_on_demand != 0
? \
*kiblnd_tunables.kib_map_on_demand :
\
IBLND_MAX_RDMA_FRAGS) /* max # of
fragments configured by user */
#define IBLND_MAX_RDMA_FRAGS LNET_MAX_IOV /* max # of
fragments supported */
/** limit on the number of fragments in discontiguous MDs */
#define LNET_MAX_IOV 256
Basically, when setting peer_credits to 32 then
init_qp_attr->cap.max_send_wr = 8224
[root at wt-2-00 ~]# ibv_devinfo -v | grep max_qp_wr
max_qp_wr: 16384
API returns -12 (out of memory)
peer_credits 16 == 4112 seems to work.
We're running on MOFED 3.0
Is there any limitation that we're hitting on the MLX side? As far as I
know MLX4 works with peer_credits set to 32.
Full device info:
[wt2user1 at wildcat2 ~]$ ibv_devinfo -v
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 12.100.6440
node_guid: e41d:2d03:0060:7652
sys_image_guid: e41d:2d03:0060:7652
vendor_id: 0x02c9
vendor_part_id: 4115
hw_ver: 0x0
board_id: MT_2180110032
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffff000
max_qp: 262144
max_qp_wr: 16384
device_cap_flags: 0x40509c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
XRC
Unknown flags: 0x40408000
device_cap_exp_flags: 0x5020007100000000
EXP_DC_TRANSPORT
EXP_MEM_MGT_EXTENSIONS
EXP_CROSS_CHANNEL
EXP_MR_ALLOCATE
EXT_ATOMICS
EXT_SEND NOP
EXP_UMR
max_sge: 30
max_sge_rd: 0
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA_REPLY_BE (64)
log atomic arg sizes (mask) 3c
max fetch and add bit boundary 64
log max atomic inline 5
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 48
max_total_mcast_qp_attach: 100663296
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 16383
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
hca_core_clock: 0
max_klm_list_size: 65536
max_send_wqe_inline_klms: 20
max_umr_recursion_depth: 4
max_umr_stride_dimension: 1
general_odp_caps:
rc_odp_caps:
NO SUPPORT
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
NO SUPPORT
dc_odp_caps:
NO SUPPORT
xrc_odp_caps:
NO SUPPORT
raw_eth_odp_caps:
NO SUPPORT
max_dct: 262144
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 19
port_lid: 1
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x2651e848
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 8
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]:
fe80:0000:0000:0000:e41d:2d03:0060:7652
thanks
amir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20150908/6bfc462f/attachment.html>
More information about the Users
mailing list