[Users] IPoIB failing

Michael Robbert mrobbert at mines.edu
Thu Apr 24 16:35:31 PDT 2014


I have a host that is running CentOS 6.4 with kernel 
2.6.32-358.el6.x86_64 and the OFED stack that shipped with that kernel. 
It has a Mellanox ConnectX-3 HCA which is configured with IPoIB. The 
only thing this host is doing is routing IP packets to/from the IPoIB 
interface to a bonded interface containing to 10Gbps Ethernet ports 
(Solarflare NIC if that matters). The node has been running fine for 
some time now while I've been preparing the systems on either side. 
Recently we put this into production allowing real users to run jobs and 
3 times in the past week the IPoIB interface has become unresponsive 
around the time that we see this message:

Apr 24 15:39:09 ibrtr-ct2 kernel: ib0: failed send event (status=1, 
wrid=13 vend_err 69)

In each case the Status is always 1, the wrid varies, and the vend_err 
is always 69. No other errors are seen and it appears that lower level 
IB functions still work fine. i.e. ibstatus is active and ibhosts sees 
the host. Also ibchecknet doesn't show any problems. I have been able to 
fix the problem by downing the interface and bringing it back up with 
ifdown and ifup.

Has anybody seen these symptoms or know specifically what the error 
message means? Any thoughts on where the problem lies or how to find 
that out?
My next planned step is to upgrade the kernel to a later CentOS 6.4 
kernel. If that doesn't help I may try replacing the HCA with an old 
Infinihost III card. We have another one of these boxes in another 
building that still has its old Infinihost III card and isn't seeing 
this problem and it is seeing the other side of all of this traffic.

Thanks,
Mike Robbert

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4003 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20140424/67ca9bc8/attachment.bin>


More information about the Users mailing list