[openib-general] SM Bad Port Handling
Eitan Zahavi
eitan at mellanox.co.il
Fri Apr 8 13:05:57 PDT 2005
Hi Hal,
I have looked up the mail thread.
I could not find a log file indicating there was a repetitive query of the
bad port.
I know the code and I can not find a reason for it to do a repetitive port.
Can you explain how this can happen?
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Friday, April 08, 2005 10:07 PM
> To: Eitan Zahavi
> Cc: openib-general at openib.org
> Subject: RE: [openib-general] SM Bad Port Handling
>
> On Fri, 2005-04-08 at 11:55, Eitan Zahavi wrote:
> > Hi Hal,
> >
> > This is a physical port attribute so the file is osm_port.h and the
> > structure is osm_physp_t.
> > From the doc on the structure:
> > *
> > * healthy
> > * Tracks the health of the port. Normally should be TRUE but
> > * might change as a result of incoming traps indicating the port
> > * healthy is questionable.
> > *
>
> Yup. It's definitely there in the gen2 code base.
>
> > I have been trying my best to find how it can happen that a port that
> > does not respond will cause OpenSM to continuously poll it. This can
> > not happen so unless you can explain how it happens please do not
> > contaminate the code with un-needed code.
>
> This part of the code has not been touched. I've put all meaningful
> patches and ideas on how things might change out on this list.
>
> > The only thing that comes to mind it the case of failure to "Set" some
> > attributes of devices in the fabric.
>
> I dug out Ron's emails on this. I don't think that was what was going
> on. It was a SubnGet(NodeInfo) which failed. See
> http://openib.org/pipermail/openib-general/2005-February/009125.html
> for more details.
>
> -- Hal
>
> > This happens only after discovery is completed and only after the
> > validity of the data base is verified (i.e. each node has ports, each
> > port have a node ...)
> >
> > In that case of failure to set some attributes (aka LFT, PortInfo etc)
> > OpenSM will output a clear error message: "Errors in intialization"
> > and will restart a full sweep of the fabric.
> >
> > This is the only way one get an infinite polling on the entire subnet.
>
> > In general, it might make sense to try and improve how OpenSM
> > qualifies each fabric port for the statistics of the number of packet
> > drops versus good packets it passed through. Note this is complex due
> > to the fact a port might affect packets that goes through it. And
> > there is no way to know on which hop on the path the packet was
> > dropped.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050408/bd1c5217/attachment.html>
More information about the general
mailing list