[openib-general] SM Bad Port Handling

Hal Rosenstock halr at voltaire.com
Thu Apr 14 03:32:02 PDT 2005


On Thu, 2005-04-14 at 02:29, Eitan Zahavi wrote:
> The point is that the real "bad" ports are not the ones that are
> killing 100% of packets
> (since they will simply have a "DOWN" state and vanish).
> 
> The real bad ports are the ones that pass < 25% (as we use retry of 4)
> of packets that goes through them. 

When the SM sends a direct route MAD it saves the port guid (and port
num) in the madw context, so that when there is a reply or timeout you
can easily find the port. That means you dont have to walk the entire DR
path to find the unhealthy port. That means that the peer port (from
which we arrived to the bad port) is unhealthy. Does this address your
concern ?

-- Hal




More information about the general mailing list