[ofw] [PATCH 0/4] Avoid the SM

Fab Tillier ftillier at windows.microsoft.com
Thu Aug 21 09:48:49 PDT 2008


This patch series updates IPoIB, IBAT, and the NetworkDirect CM proxy to avoid path queries in order to get a functional and stable fabric when node counts exceed 64 nodes.

There are two problems, the first is that OpenSM isn't stable enough to handle the load of a 64-node cluster.  The second is that OpenSM can't sustain the query rates that occur in a 64-node or larger cluster.  The ripple effect are that IPoIB fails, and connection establishment for NetworkDirect fails.

The failure in IPoIB is phased, and starts with the attempt to respond to an ARP request.  To send the unicast ARP response, IPoIB currently queries the SA for a path record to the source of the ARP request.  This path query times out due to the SA not being able to respond in time.  IPoIB then flags this error to NDIS by reporting that it is hung, and NDIS resets the adapter.  IPoIB queries the SA for its local port info (in order to report the link speed and whatnot), and that query fails too due to the SM being stressed.  This second failure puts IPoIB in a 'cable disconnected' state from which it never emerges on its own.  It does write an event log entry stating that the SA query timed out, but nothing else.  An SM reregister event will cause it to come out of this state, but since the SM doesn't necessarily die but rather was just slow to respond, a reregister event never happens.  Disabling and re-enabling IPoIB also resets it.

NetworkDirect (and any IBAT client such as WSD) connection establishment fails too due IBAT not being able to resolve the destination GID from not getting an ARP response within a reasonable (to NDIS) time period.

The end result is that the fabric ends up in a bad state, one that requires disabling/re-enabling all IPoIB instances (or restarting the SM), just to have the whole failure happen again the next time a large MPI job is run.



More information about the ofw mailing list