[Users] mthca lockup

Coulter, Susan K skc at lanl.gov
Mon Sep 30 08:27:27 PDT 2013


On Sep 23, 2013, at 9:37 AM, Orion Poplawski <orion at cora.nwra.com<mailto:orion at cora.nwra.com>> wrote:

Sep 21 12:52:11 castor kernel: ib0: transmit timeout: latency 1997 msecs
Sep 21 12:52:11 castor kernel: ib0: queue stopped 1, tx_head 2265490, tx_tail 2265362

We see these periodically - even on the newer mlx4_0 devices.

If it is happening a lot and always on the same node, the HCA probably needs replacing.

How often is it happening?

====================================

Susan Coulter
HPC-3 Network/Infrastructure
505-667-8425
Increase the Peace...
An eye for an eye leaves the whole world blind
====================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130930/a4b0c4fd/attachment.html>


More information about the Users mailing list