[ofa-general] Both opensm's are in SMINFO_STANDBY and none of them claims master
Hal Rosenstock
halr at voltaire.com
Thu May 31 11:57:47 PDT 2007
On Thu, 2007-05-31 at 15:12, Venkatesh Babu wrote:
> Hal Rosenstock wrote:
>
> >> I am seeing non zero (0 - 10) VL15 drops counter. What is the
> >>significance and cause of these errors ?
> >>
> >>
> >
> >This means that some VL15 packets arrive at the switch with no available
> >VL15 buffers so they are dropped. These could be any SM packets (SMInfo
> >is just one possibility).
> >
> >
> >
> >>How can I get rid or correct them ?
> >>
> >>
> >
> >You would need to contact your switch vendor to see if the VL15
> >buffering can be reconfigured.
> >
> >I'm not sure whether or not this is related to your standby issue or
> >not.
> >
> >
> At least opensm is not working correctly. Eventhough ibv_devinfo shows
> it as master and it is not responding to the broadcast join operations
> or it doesn't assign LIDs to other nodes.
ibv_devinfo only indicates the SMLID of the last master which claimed
this node. So if there is no real current master...
In this state, there is no master so no SA queries will be responded to.
Only an SM which was master would respond. So if some local node thinks
the SM is foo, and foo's SM is not in master, it will nott respond.
This may be an OpenSM issue or might be some lower level issue which
OpenSM is not handling well. I'm not sure which as I cannot recreate
this and am not sure what is going on in your environment.
> >Are you seeing any other errors on any of the ports ?
> >
> >
> I do see non zero port_xmit_discards error counters on some ports.
>
> Are these errors could be because of the bad cables or ports ?
I would try swapping in known good cables and see what happens.
-- Hal
> VBabu
>
> >-- Hal
> >
> >
More information about the general
mailing list