[openib-general] SM Bad Port Handling
Eitan Zahavi
eitan at mellanox.co.il
Tue Apr 19 08:05:32 PDT 2005
Yes, the algorithm looks reasonable.
I would make the number of packets required to qualify the ports on the way
a parameter with default value of 10 or 20 (surely not 4).
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, April 19, 2005 5:00 PM
> To: Eitan Zahavi
> Cc: 'shaharf'; openib-general at openib.org
> Subject: RE: [openib-general] SM Bad Port Handling
>
> On Thu, 2005-04-14 at 07:13, Eitan Zahavi wrote:
> > [EZ] Not at all. Although the target port is known. The flaky link
> > that fails the mad might be anywhere along the path to the port. So,
> > if you mark the target port as bad you might be marking the wrong
> > port!
>
> OpenSM does look for traps 128-131 which includes Local link integrity
> (129) which is likely from these noisy ports, right ?
>
> > [EZ] Let me clarify with an example:
> > SM=HCA1/P1 ->
> > SW1/P1....SW1/P2->SW2/P1..SW2/P2->SW3/P1....SW3/P3->HCA2/P1
> >
> > \..SW4/P4->SW3/P4..SW3/P5->SW3/P2../
> >
> > If the flaky link is between SW2/P2 and SW3/P1 then the packet sent to
> > HCA2 using DR : [0][1][1][2][3] might fail . If you mark HCA2/P1 as
> > bad then you actually will loose that HCA for no good reason since
> > another path from SM to HCA2 exists.
>
> OK. To be really sure about the failed port, one could then walk the
> entire DR path from the SM to the perceived non responding port and if
> the same port along the path doesn't respond some number of times (say
> 4) in a row, that port's peer port can be marked as unhealthy. Is this
> algorithm acceptable ?
>
> -- Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050419/ebb3f101/attachment.html>
More information about the general
mailing list