[openib-general] question on opensm error
Hal Rosenstock
halr at voltaire.com
Tue Feb 15 05:09:55 PST 2005
Hi,
A couple more things on this:
On Tue, 2005-02-15 at 06:50, Hal Rosenstock wrote:
> Hi Ron,
>
> On Mon, 2005-02-14 at 15:59, Ronald G. Minnich wrote:
> > formerly working opensm starts to get these:
>
> So the OpenSM was up and running and these messages appeared in the log.
> Did anything change in the subnet ?
>
> > [1108414727:000284173][411FF970] -> umad_receiver: send completed with
> > error(method=1 attr=11) -- dropping.
> > [1108414727:000384171][411FF970] -> umad_receiver: send completed with
> > error(method=1 attr=11) -- dropping.
> > [1108414727:000484169][411FF970] -> umad_receiver: send completed with
> > error(method=1 attr=11) -- dropping.
>
> These are failures of the OpenSM to send a SM Get(NodeInfo) which are
> used during the periodic subnet sweeps.
The SM does two kinds of sweeps - a periodic light sweep, where it
queries all switches for switchinfo to search for port changed bit, and
heavy sweep that is triggered and is not periodic. The nodeinfo is part
of the heavy sweep.
> I think the only way this error
> happens is if physical link is not present on the local link (e.g.
> logical link is not in init state or beyond).
ibstatus/ibstat can show the local port logical and physical port state.
> So was a cable pulled somewhere ?
Node info should be answered even if there is no logical link. Such
errors may be a result of
a. bad cable
b. bad HCA or firmware
c. hung or crashed kernel.
> Is this problem intermittent ? Does it come and go for no apparent
> reason ? Does the subnet get out of this state or do you need to
> restart OpenSM ?
>
> Are there any other messages in the log around this which might be
> useful ?
It might be helpful to try running ibnetdiscover -e (to show the
errors). smpquery can also be used to query the bad link/host.
>
> Thanks.
>
> -- Hal
>
> >
> >
> >
> > what's a reasonable thing to look for, or should I just svn update and
> > hope for the best?
> >
> > thanks
> >
> > ron
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list