[openib-general] SM Bad Port Handling

Eitan Zahavi eitan at mellanox.co.il
Tue Apr 12 22:28:28 PDT 2005


> -- Hal 
> In looking at the unhealthy code, it appears to me that the unhealthy
> bit is only set if the SM receives traps 129-131 and not if the SMA does
> not respond to SM MADs so these ports will not be detected and hence not
> bypassed.
> 
[EZ] This is true. Currently there is only one cause for the un-healthy bits
to be set - which are exactly as you point - these traps. The point I was
trying to make was that this bit is the mechanism for flagging a port status
is bad. 
What I did recommend was to write a "statistical" analysis of Directed Route
packet drop - such that we can find the ports with a high drop rate and mark
them as un-healthy. If you mark every port that does not respond to a MAD as
un-healthy you can suffer from flaky links somewhere on the route to that
port. Only analysis of the number of good packets vs. dropped packets can
lead you to the right bad port.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050413/5c2cc748/attachment.html>


More information about the general mailing list