[Users] IPoIB on CentOS 6.5

Mehmet Soysal mehmet.soysal at kit.edu
Tue Mar 17 07:54:02 PDT 2015


Hi,
did you solved the problem ?
We have a similar issue since a upgrade to RHEL 6.5 or higher.

On our nodes ipoib is not working any longer after a opensm fail over 
occurs.
We have serveral nodes from different vendors. All Red-Hat machines are 
affected,
SUSE machines are working fine after a opensm fail over.

We did not noticed that issue, cause after a reboot ipoib is doing fine
and then suddenly stops working on all nodes. Everything else is still 
working fine,
like mpi communication or lustre. But if the Client need to reconnect to 
a lustre server,
due to a lustre failover, this is initially done over IP (ipoib).
This took a long time until we pinned that issue down to a opensm fail over.

Our RHEL nodes have also ConnectX3 cards.
Update to RHEL 6.6 does not solve this issue.
We opened a Case at Redhat for it and waiting for a fix or a solution.



best regards
M.Soysal

  




More information about the Users mailing list