[openib-general] SM Bad Port Handling

Hal Rosenstock halr at voltaire.com
Fri Apr 8 12:06:44 PDT 2005


On Fri, 2005-04-08 at 11:55, Eitan Zahavi wrote: 
> Hi Hal,
> 
> This is a physical port attribute so the file is osm_port.h and the
> structure is osm_physp_t.
> From the doc on the structure:
> *
> *  healthy
> *     Tracks the health of the port. Normally should be TRUE but 
> *     might change as a result of incoming traps indicating the port
> *     healthy is questionable.
> *

Yup. It's definitely there in the gen2 code base.

> I have been trying my best to find how it can happen that a port that
> does not respond will cause OpenSM to continuously poll it. This can
> not happen so unless you can explain how it happens please do not
> contaminate the code with un-needed code.

This part of the code has not been touched. I've put all meaningful
patches and ideas on how things might change out on this list.

> The only thing that comes to mind it the case of failure to "Set" some
> attributes of devices in the fabric. 

I dug out Ron's emails on this. I don't think that was what was going
on. It was a SubnGet(NodeInfo) which failed. See
http://openib.org/pipermail/openib-general/2005-February/009125.html
for more details.

-- Hal

> This happens only after discovery is completed and only after the
> validity of the data base is verified (i.e. each node has ports, each
> port have a node ...)
> 
> In that case of failure to set some attributes (aka LFT, PortInfo etc)
> OpenSM will output a clear error message: "Errors in intialization"
> and will restart a full sweep of the fabric. 
> 
> This is the only way one get an infinite polling on the entire subnet.

> In general, it might make sense to try and improve how OpenSM
> qualifies each fabric port for the statistics of the number of packet
> drops versus good packets it passed through. Note this is complex due
> to the fact a port might affect packets that goes through it. And
> there is no way to know on which hop on the path the packet was
> dropped. 





More information about the general mailing list