[ofw] [PATCH 0/4] Avoid the SM

Mon Aug 25 09:16:49 PDT 2008

Hi Fab,

I have tried applying your patch against the latest trunk and have found
that there was a problem with it.

My configuration was two machines connected back to back with opensm
running on one of them.
It seems that after a reboot, ping didn't pass. The computer that wasn't
running opensm has received the arp and answered it but that never
reached the other side.

[I did noticed that some times both machines had the remote side mac
address in the arp table, but than still packets didn't arrive.]

I have also tried the same thing without Voltaire partition patch and
the same problem has happened.

Can you please try this scenario and see if things work for you? (you
might need to reboot one of the machine more than once to reach the
problem).

Thanks
Tzachi

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org 
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> Sent: Thursday, August 21, 2008 7:49 PM
> To: ofw at lists.openfabrics.org
> Subject: [ofw] [PATCH 0/4] Avoid the SM
> 
> This patch series updates IPoIB, IBAT, and the NetworkDirect 
> CM proxy to avoid path queries in order to get a functional 
> and stable fabric when node counts exceed 64 nodes.
> 
> There are two problems, the first is that OpenSM isn't stable 
> enough to handle the load of a 64-node cluster.  The second 
> is that OpenSM can't sustain the query rates that occur in a 
> 64-node or larger cluster.  The ripple effect are that IPoIB 
> fails, and connection establishment for NetworkDirect fails.
> 
> The failure in IPoIB is phased, and starts with the attempt 
> to respond to an ARP request.  To send the unicast ARP 
> response, IPoIB currently queries the SA for a path record to 
> the source of the ARP request.  This path query times out due 
> to the SA not being able to respond in time.  IPoIB then 
> flags this error to NDIS by reporting that it is hung, and 
> NDIS resets the adapter.  IPoIB queries the SA for its local 
> port info (in order to report the link speed and whatnot), 
> and that query fails too due to the SM being stressed.  This 
> second failure puts IPoIB in a 'cable disconnected' state 
> from which it never emerges on its own.  It does write an 
> event log entry stating that the SA query timed out, but 
> nothing else.  An SM reregister event will cause it to come 
> out of this state, but since the SM doesn't necessarily die 
> but rather was just slow to respond, a reregister event never 
> happens.  Disabling and re-enabling IPoIB also resets it.
> 
> NetworkDirect (and any IBAT client such as WSD) connection 
> establishment fails too due IBAT not being able to resolve 
> the destination GID from not getting an ARP response within a 
> reasonable (to NDIS) time period.
> 
> The end result is that the fabric ends up in a bad state, one 
> that requires disabling/re-enabling all IPoIB instances (or 
> restarting the SM), just to have the whole failure happen 
> again the next time a large MPI job is run.
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>