[openib-general] SM Bad Port Handling
Hal Rosenstock
halr at voltaire.com
Fri Apr 8 12:06:44 PDT 2005
On Fri, 2005-04-08 at 11:55, Eitan Zahavi wrote:
> Hi Hal,
>
> This is a physical port attribute so the file is osm_port.h and the
> structure is osm_physp_t.
> From the doc on the structure:
> *
> * healthy
> * Tracks the health of the port. Normally should be TRUE but
> * might change as a result of incoming traps indicating the port
> * healthy is questionable.
> *
Yup. It's definitely there in the gen2 code base.
> I have been trying my best to find how it can happen that a port that
> does not respond will cause OpenSM to continuously poll it. This can
> not happen so unless you can explain how it happens please do not
> contaminate the code with un-needed code.
This part of the code has not been touched. I've put all meaningful
patches and ideas on how things might change out on this list.
> The only thing that comes to mind it the case of failure to "Set" some
> attributes of devices in the fabric.
I dug out Ron's emails on this. I don't think that was what was going
on. It was a SubnGet(NodeInfo) which failed. See
http://openib.org/pipermail/openib-general/2005-February/009125.html
for more details.
-- Hal
> This happens only after discovery is completed and only after the
> validity of the data base is verified (i.e. each node has ports, each
> port have a node ...)
>
> In that case of failure to set some attributes (aka LFT, PortInfo etc)
> OpenSM will output a clear error message: "Errors in intialization"
> and will restart a full sweep of the fabric.
>
> This is the only way one get an infinite polling on the entire subnet.
> In general, it might make sense to try and improve how OpenSM
> qualifies each fabric port for the statistics of the number of packet
> drops versus good packets it passed through. Note this is complex due
> to the fact a port might affect packets that goes through it. And
> there is no way to know on which hop on the path the packet was
> dropped.
More information about the general
mailing list