[openib-general] SM Bad Port Handling

Wed Apr 13 23:29:08 PDT 2005

Hi Shahar,

> 	Your analysis is not completely accurate. The SM configure the
> subnet using direct mads only, and it builds a spanning tree of direct
> routes. What I want to say, is that that it doesn't matter why exactly a
> port is unreachable. Once a port can not be reached, you can either
> retry the entire heavy sweep process, but if the problem repeats itself
> (X times) on the same port, you have no alternative other then disable
> it.
The point is that the real "bad" ports are not the ones that are killing
100% of packets
(since they will simply have a "DOWN" state and vanish).

The real bad ports are the ones that pass < 25% (as we use retry of 4) of
packets that goes through them. 

When such a port happen to be on a switch it will normally cause other ports
to appear to be "bad" - NOT ITSELF !
The reason for it is that the number of packets sent through a switch port
(not a leaf switch port) is much larger then the number of packets that
deals with the discovery of the port itself. All the ports "behind" the
switch port will go through that port. And there is a much higher chance for
ALL the packets that goes to an end-port be dropped then the chance for ALL
the packets that goes through the switch ports to be dropped).

So if you implement the feature the way it was proposed what you will end up
with is disconnecting end-ports and not the real bad port.

Why is it bad? It is bad since in tree topology the end-ports always have an
alternate path to the SM. If you could find the real flaky bad port - you
could still communicate with all the end-ports.

So how do we find that bad port/cable that causes other port to appear bad?
We have internally had many long discussions on this topic. The algorithm is
not fully developed yet. But several things are clear:
1. One needs to track the number of successful and bad packet flowing
through each port. Such that a failure rate can be obtained for each port.
2. Topology based analysis should be used to find the common point that is
first to have a high drop rate on the directed route tree.
3. Alternate directed routes might be used to invalidate "suspicious" ports.

In any case, I was not proposing relying on traps. I was suggesting to use
the 
"healthy" bit on physical ports as the way to carry the information about
"bad" ports (once we correctly find them) into the rest of the algorithms
used by the SM.

Regarding the need to "disconnect" a bad HCA "end-port" - I still have not
seen any log showing OpenSM going through infinite "polling" of bad ports.
As I know the code - I can not believe this is possible - so unless you have
a log that shows this phenomena (and not another one) please do not chance
this path.

One last word. I would highly recommend using the management simulator for
setting arbitrary (random) bad packet drops and test any algorithm you might
think of.

EZ

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL

> -----Original Message-----
> From: shaharf [mailto:shaharf at voltaire.com]
> Sent: Wednesday, April 13, 2005 5:03 PM
> To: Eitan Zahavi; Hal Rosenstock
> Cc: openib-general at openib.org
> Subject: RE: [openib-general] SM Bad Port Handling
> 
> Eitan,
> 
> 	Your analysis is not completely accurate. The SM configure the
> subnet using direct mads only, and it builds a spanning tree of direct
> routes. What I want to say, is that that it doesn't matter why exactly a
> port is unreachable. Once a port can not be reached, you can either
> retry the entire heavy sweep process, but if the problem repeats itself
> (X times) on the same port, you have no alternative other then disable
> it. If the SM will have an alternative method of building direct paths,
> then such alternative path could be attempted. Currently it is not
> relevant. Speaking of "statistical analysis", what are the odds that a
> port will behave well when it is queried directly, but starts to loose
> packets when a direct route is routed through it, and behave
> consistently during all retries? Again, even if this is the case (and in
> understatement, I am not sure how frequent it is), the port behind it is
> unreachable and therefore "bad".
> 
> The current unhealthy port mechanism is not redundant to this "bad" port
> mechanism because it does not handle the same case. Both mechanisms are
> required. The issue if they can share the same status bit is really an
> implementation issue.
> 
> Relying of traps is very problematic in some cases, particularly in
> initial bring up sweep when the SM lid is not even configured (remember
> VTEC?).
> 
> Shahar
> 
> 
> ________________________________________
> From: openib-general-bounces at openib.org
> [mailto:openib-general-bounces at openib.org] On Behalf Of Eitan Zahavi
> Sent: Wednesday, April 13, 2005 11:21 AM
> To: Hal Rosenstock; Eitan Zahavi
> Cc: openib-general at openib.org
> Subject: RE: [openib-general] SM Bad Port Handling
> 
> I probably did not make point very clear:
> It is bad (not to say wrong) to disqualify a port and mark it as bad
> port if it did not respond to queries.
> The cause of the issue might be a flaky link on the directed route to
> the port.
> If the SM would be able to find that flaky link port it would avoid
> marking the wrong ports. More over, the port that was almost marked as
> bad by the simplistic algorithm you propose will be discovered and
> operational as there many other paths to reach it - walking around the
> real bad port !
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Wednesday, April 13, 2005 12:00 PM
> > To: Eitan Zahavi
> > Cc: openib-general at openib.org
> > Subject: RE: [openib-general] SM Bad Port Handling
> >
> > On Wed, 2005-04-13 at 01:28, Eitan Zahavi wrote:
> > > [EZ] This is true. Currently there is only one cause for the
> > > un-healthy bits to be set - which are exactly as you point - these
> > > traps. The point I was trying to make was that this bit is the
> > > mechanism for flagging a port status is bad.
> > >
> > > What I did recommend was to write a "statistical" analysis of
> Directed
> > > Route packet drop - such that we can find the ports with a high drop
> 
> > > rate and mark them as un-healthy. If you mark every port that does
> not
> > > respond to a MAD as un-healthy you can suffer from flaky links
> > > somewhere on the route to that port. Only analysis of the number of
> > > good packets vs. dropped packets can lead you to the right bad port.
> 
> >
> > The original proposal on this said the following:
> >
> > "The OpenSM will implement a configurable policy (some number of
> > consecutive lack of responses to SM requests). At the point of
> > exhaustion of the timeout/retry strategy, that port will be marked as
> > "bad" by OpenSM."
> >
> > Any idea on what might make a good default threshold (for consecutive
> > retries) ? Do you think there is no sufficient default ?
> >
> > If a link is flaky and MADs can't get through, should it be used for
> non
> > MAD traffic ?
> >
> > Also note that the proposal also said:
> >
> > "Also, there could also be a periodic "ping" at a slower rate to check
> 
> > if the "bad" ports revive."
> >
> > In terms of analysis of good v. errored and dropped packets (along the
> 
> > path to that node), there are OpenIB diagnostic tools to help with
> this.
> >
> > -- Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050414/81a0d050/attachment.html>