[ofa-general] ibcheckerrors give error 5691 within OFED 1.3.1
Wen Hao Wang
wangwhao at cn.ibm.com
Wed Sep 17 01:25:00 PDT 2008
Hi all:
I had one IB cluster with eight IBM HS21 blades, mixed with RHEL5.2 Server
and SLES10 SP2. All of them connected to one IB switch. opensm was running
as subnet manager on one blade. Command ibcheckerrors finished smoothly.
Last week I got another eight IBM LS21 blades connected to another IB
switch. But after I connected two switches and turned on all the IB
adapters on new blades, ibcheckerrors gave error message:
[root at gaia-07 ~]# ibcheckerrors
#warn: counter RcvErrors = 5691 (threshold 10) lid 3 port 1
Error check on lid 3 (gaia-07 HCA-1) port 1: FAILED
## Summary: 19 nodes checked, 0 bad nodes found
## 46 ports checked, 1 ports have errors beyond threshold
[root at gaia-07 ~]# ibv_devinfo
hca_id: mlx4_0
fw_ver: 2.3.000
node_guid: 0002:c903:0001:3370
sys_image_guid: 0002:c903:0001:3373
vendor_id: 0x02c9
vendor_part_id: 25418
hw_ver: 0xA0
board_id: IBM08A0000001
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 15
port_lid: 3
port_lmc: 0x00
port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
[root at gaia-07 ~]# ibcheckport 3 1
[root at gaia-07 ~]# echo $?
0
I had closed the embeded subnet manager on two IB switches. The issue
always exist, even after I change subnet manager location to another
machine. ib0 of machine gaia-07 can communicate with other machines each
other. All installed IB adapters are ConnectX 4xSDR. Both switches are
Topspin Switches. Will anyone give some advice about this issue? Thanks in
advance!
Wen Hao Wang
Email: wangwhao at cn.ibm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080917/a37fa080/attachment.html>
More information about the general
mailing list