<html><body>
<p>Hi all:<br>
<br>
I had one IB cluster with eight IBM HS21 blades, mixed with RHEL5.2 Server and SLES10 SP2. All of them connected to one IB switch. opensm was running as subnet manager on one blade. Command ibcheckerrors finished smoothly. Last week I got another eight IBM LS21 blades connected to another IB switch. But after I connected two switches and turned on all the IB adapters on new blades, ibcheckerrors gave error message:<br>
<br>
[root@gaia-07 ~]# ibcheckerrors<br>
#warn: counter RcvErrors = 5691 (threshold 10) lid 3 port 1<br>
Error check on lid 3 (gaia-07 HCA-1) port 1: FAILED<br>
<br>
## Summary: 19 nodes checked, 0 bad nodes found<br>
## 46 ports checked, 1 ports have errors beyond threshold<br>
[root@gaia-07 ~]# ibv_devinfo<br>
hca_id: mlx4_0<br>
fw_ver: 2.3.000<br>
node_guid: 0002:c903:0001:3370<br>
sys_image_guid: 0002:c903:0001:3373<br>
vendor_id: 0x02c9<br>
vendor_part_id: 25418<br>
hw_ver: 0xA0<br>
board_id: IBM08A0000001<br>
phys_port_cnt: 2<br>
port: 1<br>
state: PORT_ACTIVE (4)<br>
max_mtu: 2048 (4)<br>
active_mtu: 2048 (4)<br>
sm_lid: 15<br>
port_lid: 3<br>
port_lmc: 0x00<br>
<br>
port: 2<br>
state: PORT_DOWN (1)<br>
max_mtu: 2048 (4)<br>
active_mtu: 2048 (4)<br>
sm_lid: 0<br>
port_lid: 0<br>
port_lmc: 0x00<br>
[root@gaia-07 ~]# ibcheckport 3 1<br>
[root@gaia-07 ~]# echo $?<br>
0<br>
<br>
I had closed the embeded subnet manager on two IB switches. The issue always exist, even after I change subnet manager location to another machine. ib0 of machine gaia-07 can communicate with other machines each other. All installed IB adapters are ConnectX 4xSDR. Both switches are Topspin Switches. Will anyone give some advice about this issue? Thanks in advance!<br>
<br>
Wen Hao Wang<br>
Email: wangwhao@cn.ibm.com<br>
</body></html>