[openib-general] SM Bad Port Handling

Eitan Zahavi eitan at mellanox.co.il
Fri Apr 8 08:55:39 PDT 2005


Hi Hal,

This is a physical port attribute so the file is osm_port.h and the
structure is osm_physp_t.
>From the doc on the structure:
*
*  healthy
*     Tracks the health of the port. Normally should be TRUE but 
*     might change as a result of incoming traps indicating the port
*     healthy is questionable.
*

I have been trying my best to find how it can happen that a port that does
not respond will cause OpenSM to continuously poll it. This can not happen
so unless you can explain how it happens please do not contaminate the code
with un-needed code.

The only thing that comes to mind it the case of failure to "Set" some
attributes of devices in the fabric. This happens only after discovery is
completed and only after the validity of the data base is verified (i.e.
each node has ports, each port have a node ...)
In that case of failure to set some attributes (aka LFT, PortInfo etc)
OpenSM will output a clear error message: "Errors in intialization" and will
restart a full sweep of the fabric. 
This is the only way one get an infinite polling on the entire subnet.

In general, it might make sense to try and improve how OpenSM qualifies each
fabric port for the statistics of the number of packet drops versus good
packets it passed through. Note this is complex due to the fact a port might
affect packets that goes through it. And there is no way to know on which
hop on the path the packet was dropped. 

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Friday, April 08, 2005 6:24 PM
> To: Eitan Zahavi
> Cc: openib-general at openib.org
> Subject: RE: [openib-general] SM Bad Port Handling
> 
> Hi Eitan,
> 
> On Thu, 2005-04-07 at 16:02, Eitan Zahavi wrote:
> > [EZ] It is better to use the "un-healthy" bit of the physical port -
> > which OpenSM is already maintaining.
> 
> What is the name of this bit and in what structure does it appear ?
> 
> Thanks.
> 
> -- Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050408/6e2a5d2e/attachment.html>


More information about the general mailing list