[ofa-general] Infiniband Problems
David Robb
DavidRobb at comsci.co.uk
Thu Jun 21 12:00:38 PDT 2007
Hi Folks,
I have inherited responsibility for the comms subsystem on a 28 node
high performance signal processing cluster inter connected with Infiniband.
Being new to this technology, I have been trying to read and learn as
much as possible but am having a few specific problems. Any help or
pointers in the right direction would be greatly appreciated.
1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second
I have written an RDMA transfer unit test that transfers 10000 packets
from one node to another and times the performance. Mostly this happens
with a loop iteration of the order of 30uS, but occasionally, I observe
times of 500,000 to 1,100,000uS for one packet. I don't think it's a
problem with our queuing layer ( If I remove the call to
ibv_post_send(...) then no stall is observed). I don't think it is a
problem with the CPU stalling as I created a separate worker thread that
does something else and times the loop and this does not exhibit any
stalls. Any suggestions where to look next?
2. Creation of a Queue Pair is rejected when I have mapped a region of
memory greater than about 1.35GB.
Ideally, we would like the to be able to write anywhere within a 2GB (or
larger) shared memory segment. However, when I attempt to do this, the
call to fails with REJ. Further reading around the subject, suggests
that this may be due to the VPTT (Virtual to Physical Translation Table)
resources required for mapping such a large memory area. Can anyone
confirm this hypothesis? Even if we get this to work, will we suffer
performance problems by using such a large memory area? Are there any
workarounds?
Many thanks,
David Robb
Device and Environment Information follows:-
OS Kernel
bash-3.00$ uname -a
Linux qinetiq01 2.6.20.1-clustervision-142_cvos #1 SMP Tue Mar 6
00:19:24 GMT 2007 x86_64 x86_64 x86_64 GNU/Linux
OFED library version 1.1
ibv_devinfo -v output:-
hca_id: mthca0
fw_ver: 1.1.0
node_guid: 0002:c902:0023:a1d8
sys_image_guid: 0002:c902:0023:a1db
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: MT_03B0140002
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffff000
max_qp: 64512
max_qp_wr: 16384
device_cap_flags: 0x00001c76
max_sge: 30
max_sge_rd: 0
max_cq: 65408
max_cqe: 131071
max_mr: 131056
max_pd: 32764
max_qp_rd_atom: 4
max_ee_rd_atom: 0
max_res_rd_atom: 258048
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 8
max_total_mcast_qp_attach: 65536
max_ah: 0
max_fmr: 0
max_srq: 960
max_srq_wr: 16384
max_srq_sge: 30
max_pkeys: 64
local_ca_ack_delay: 15
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
max_msg_sz: 0x80000000
port_cap_flags: 0x02510a6a
max_vl_num: 3
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 64
gid_tbl_len: 32
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 5.0 Gbps (2)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0002:c902:0023:a1d9
Switches are "MT47396 Infiniscale-III Mellanox Technologies
More information about the general
mailing list