[Users] mthca lockup
Coulter, Susan K
skc at lanl.gov
Mon Sep 30 08:27:27 PDT 2013
On Sep 23, 2013, at 9:37 AM, Orion Poplawski <orion at cora.nwra.com<mailto:orion at cora.nwra.com>> wrote:
Sep 21 12:52:11 castor kernel: ib0: transmit timeout: latency 1997 msecs
Sep 21 12:52:11 castor kernel: ib0: queue stopped 1, tx_head 2265490, tx_tail 2265362
We see these periodically - even on the newer mlx4_0 devices.
If it is happening a lot and always on the same node, the HCA probably needs replacing.
How often is it happening?
====================================
Susan Coulter
HPC-3 Network/Infrastructure
505-667-8425
Increase the Peace...
An eye for an eye leaves the whole world blind
====================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130930/a4b0c4fd/attachment.html>
More information about the Users
mailing list