[ofa-general] Infiniband Problems

David Robb DavidRobb at comsci.co.uk
Thu Jun 21 12:00:38 PDT 2007


Hi Folks,

I have inherited responsibility for the comms subsystem on a 28 node 
high performance signal processing cluster inter connected with Infiniband.

Being new to this technology, I have been trying to read and learn as 
much as possible but am having a few specific problems. Any help or 
pointers in the right direction would be greatly appreciated.

1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second

I have written an RDMA transfer unit test that transfers 10000 packets 
from one node to another and times the performance. Mostly this happens 
with a loop iteration of the order of 30uS, but occasionally, I observe 
times of 500,000 to 1,100,000uS for one packet. I don't think it's a 
problem with our queuing layer ( If I remove the call to 
ibv_post_send(...) then no stall is observed). I don't think it is a 
problem with the CPU stalling as I created a separate worker thread that 
does something else and times the loop and this does not exhibit any 
stalls. Any suggestions where to look next?


2. Creation of a Queue Pair is rejected when I have mapped a region of 
memory greater than about 1.35GB.

Ideally, we would like the to be able to write anywhere within a 2GB (or 
larger) shared memory segment. However, when I attempt to do this, the 
call to fails with REJ. Further reading around the subject, suggests 
that this may be due to the VPTT (Virtual to Physical Translation Table) 
resources required for mapping such a large memory area. Can anyone 
confirm this hypothesis? Even if we get this to work, will we suffer 
performance problems by using such a large memory area? Are there any 
workarounds?

Many thanks,

David Robb


Device and Environment Information follows:-

OS Kernel

bash-3.00$ uname -a
Linux qinetiq01 2.6.20.1-clustervision-142_cvos #1 SMP Tue Mar 6 
00:19:24 GMT 2007 x86_64 x86_64 x86_64 GNU/Linux

OFED library version 1.1

ibv_devinfo -v output:-
hca_id: mthca0
fw_ver: 1.1.0
node_guid: 0002:c902:0023:a1d8
sys_image_guid: 0002:c902:0023:a1db
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: MT_03B0140002
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffff000
max_qp: 64512
max_qp_wr: 16384
device_cap_flags: 0x00001c76
max_sge: 30
max_sge_rd: 0
max_cq: 65408
max_cqe: 131071
max_mr: 131056
max_pd: 32764
max_qp_rd_atom: 4
max_ee_rd_atom: 0
max_res_rd_atom: 258048
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 8
max_total_mcast_qp_attach: 65536
max_ah: 0
max_fmr: 0
max_srq: 960
max_srq_wr: 16384
max_srq_sge: 30
max_pkeys: 64
local_ca_ack_delay: 15
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
max_msg_sz: 0x80000000
port_cap_flags: 0x02510a6a
max_vl_num: 3
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 64
gid_tbl_len: 32
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 5.0 Gbps (2)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0002:c902:0023:a1d9

Switches are "MT47396 Infiniscale-III Mellanox Technologies




More information about the general mailing list