[openib-general] SM Bad Port Handling

Grant Grundler iod00d at hp.com
Thu Apr 7 13:12:41 PDT 2005


On Thu, Apr 07, 2005 at 03:32:24PM -0400, Hal Rosenstock wrote:
...
> Assumption:
> 
> The proposed solution assumes that the ignore GUIDs file option of
> OpenSM only impacts the routing algorithm (path counting) and should not
> be extended for bad port handling.
> 
> Proposed Solution:
> 
> The OpenSM will implement a configurable policy (some number of
> consecutive lack of responses to SM requests). At the point of
> exhaustion of the timeout/retry strategy, that port will be marked as
> "bad" by OpenSM.

Generally speaking, seperating recovery "policy" from "detection"
is a good thing. 

...
> Is there a need to store these "bad" ports persistently (and ignore them
> on startup) ?

If opensm can see the physical link is ok, I would think it save
any state. It's possible a system just hasn't loaded whatever
SW is necessary to talk to the SM and might require operator
intervention to kick that off (e.g. none of my systems auto-reboot
unless I'm testing a specific customer environment).

I expect it's a seperate policy on how long to save information
after the physical link has been dropped - similar to DHCP.


grant



More information about the general mailing list