[ofa-general] smpquery regression in 1.3-rc1

Sasha Khapyorsky sashak at voltaire.com
Thu Dec 20 09:13:18 PST 2007


On 08:49 Thu 20 Dec     , Hal Rosenstock wrote:
> > >>>>
> > >>>> Anyone else seeing the same?
> > >>> -G requires a SA path record lookup so this could be an issue with that
> > >>> timing out in some cases (assuming the port is active and the SM is
> > >>> operational).
> > >> I'm seeing the same problem.
> > >> Sometimes the query works, and sometimes it doesn't.
> > >> I also see that when the query fails, OpenSM doesn't get PathRecord query at all.
> > >>
> > >> Hal, can you elaborate on "that timing out in some cases" issue?
> > > 
> > > I just meant that the SM not responding (for an unknown reason right
> > > now) would yield this effect.
> > > 
> > >> Adding Jack for the libibmad issue:
> > >>
> > >> I see that the ib_path_query() in libibmad/sa.c sometimes fails
> > >> when calling safe_sa_call().
> > > 
> > > This could just be more detail on the same thing in terms of the
> > > (smpquery) client which is layered on top of libibmad: the SA path query
> > > timeout.
> > > I would suggest running OpenSM in verbose mode (both instances are with
> > > OpenSM) and seeing if it responds to the PathRecord query used by this
> > > form of smpquery and continue troubleshooting from there based on the
> > > result.
> > 
> > This is actually what I was saying here.
> > I have *debugged* smpquery, and saw that the failing function is
> > ib_path_query() in libibmad/sa.c
> > As I've mentioned, I did run it with OpenSM in verbose mode, and saw
> > that when smpquery fails, OpenSM log does not have any PathRecord request.
> > When smpquery passes, I see the PathRecord request and response in the
> > OpenSM log.
> 
> OK; that wasn't clear before but is now (that the failure appears to be
> a client and not SM issue) :-) FWIW, I don't know what has changed that
> would affect this so it could be a latent bug as opposed to a
> regression.

Right, there were no changes in this area in this period, likely issue
just triggered. I'm not sure but probably I saw something like this in a
past, but then thought it was cabling issue.

Yevgeny, Arthur, could you rerun smpquery with -dddd (for lot of debug
stuff)?

Sasha



More information about the general mailing list