[openib-general] SM Bad Port Handling

Eitan Zahavi eitan at mellanox.co.il
Thu Apr 14 04:13:18 PDT 2005


> 
> When the SM sends a direct route MAD it saves the port guid (and port
> num) in the madw context, so that when there is a reply or timeout you
> can easily find the port. That means you dont have to walk the entire DR
> path to find the unhealthy port. That means that the peer port (from
> which we arrived to the bad port) is unhealthy. Does this address your
> concern ?
> 
[EZ] Not at all. Although the target port is known. The flaky link that
fails the mad might be anywhere along the path to the port. So, if you mark
the target port as bad you might be marking the wrong port!

 [EZ] Let me clarify with an example:
SM=HCA1/P1 -> SW1/P1....SW1/P2->SW2/P1..SW2/P2->SW3/P1....SW3/P3->HCA2/P1
 
\..SW4/P4->SW3/P4..SW3/P5->SW3/P2../
                           
If the flaky link is between SW2/P2 and SW3/P1 then the packet sent to HCA2
using DR : [0][1][1][2][3] might fail . If you mark HCA2/P1 as bad then you
actually will loose that HCA for no good reason since another path from SM
to HCA2 exists.

EZ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050414/947f5600/attachment.html>


More information about the general mailing list