[ofa-general] smpquery regression in 1.3-rc1

Hal Rosenstock hrosenstock at xsigo.com
Thu Dec 20 06:07:39 PST 2007


On Thu, 2007-12-20 at 13:42 +0200, Yevgeny Kliteynik wrote:
> Hal Rosenstock wrote:
> > On Wed, 2007-12-19 at 11:58 -0800, akepner at sgi.com wrote:
> >> We're seeing a regression in smpquery from alpha2 to rc1. 
> >>
> >> For example, with alpha2 I get:
> >> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
> >> # Node info: Lid 3
> >> BaseVers:........................1
> >> ClassVers:.......................1
> >> NodeType:........................Channel Adapter
> >> NumPorts:........................2
> >> SystemGuid:......................0x00066a009800737c
> >> Guid:............................0x00066a009800737c
> >> PortGuid:........................0x00066a01a000737c
> >> PartCap:.........................64
> >> DevId:...........................0x6278
> >> Revision:........................0x000000a0
> >> LocalPort:.......................2
> >> VendorId:........................0x00066a
> >> grommit:~ # 
> >>
> >>
> >> And with rc1, I get:
> >> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
> >> ibwarn: [5650] ib_path_query: sa call path_query failed
> >> smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c
> >> grommit:~ #  
> >>
> >> But using a LID works fine:
> >> grommit:~ # smpquery nodeinfo 3
> >> # Node info: Lid 3
> >> BaseVers:........................1
> >> ClassVers:.......................1
> >> NodeType:........................Channel Adapter
> >> NumPorts:........................2
> >> SystemGuid:......................0x00066a009800737c
> >> Guid:............................0x00066a009800737c
> >> PortGuid:........................0x00066a01a000737c
> >> PartCap:.........................64
> >> DevId:...........................0x6278
> >> Revision:........................0x000000a0
> >> LocalPort:.......................2
> >> VendorId:........................0x00066a
> >> grommit:~ # 
> >>
> >> Strangest of all, running it under strace also works:
> >> grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c > /tmp/smpquery.out 
> >> .....
> >> grommit:~ # cat /tmp/smpquery.out
> >> # Node info: Lid 3
> >> BaseVers:........................1
> >> ClassVers:.......................1
> >> NodeType:........................Channel Adapter
> >> NumPorts:........................2
> >> SystemGuid:......................0x00066a009800737c
> >> Guid:............................0x00066a009800737c
> >> PortGuid:........................0x00066a01a000737c
> >> PartCap:.........................64
> >> DevId:...........................0x6278
> >> Revision:........................0x000000a0
> >> LocalPort:.......................2
> >> VendorId:........................0x00066a
> >> grommit:~ #
> >>
> >> Some weird race condition...
> >>
> >> Anyone else seeing the same?
> > 
> > -G requires a SA path record lookup so this could be an issue with that
> > timing out in some cases (assuming the port is active and the SM is
> > operational).
> 
> I'm seeing the same problem.
> Sometimes the query works, and sometimes it doesn't.
> I also see that when the query fails, OpenSM doesn't get PathRecord query at all.
> 
> Hal, can you elaborate on "that timing out in some cases" issue?

I just meant that the SM not responding (for an unknown reason right
now) would yield this effect.

> Adding Jack for the libibmad issue:
> 
> I see that the ib_path_query() in libibmad/sa.c sometimes fails
> when calling safe_sa_call().

This could just be more detail on the same thing in terms of the
(smpquery) client which is layered on top of libibmad: the SA path query
timeout.

I would suggest running OpenSM in verbose mode (both instances are with
OpenSM) and seeing if it responds to the PathRecord query used by this
form of smpquery and continue troubleshooting from there based on the
result.

-- Hal

> -- Yevgeny
> 
> > -- Hal
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
> 



More information about the general mailing list