[ofa-general] smpquery regression in 1.3-rc1

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Thu Dec 20 07:43:37 PST 2007


Hal Rosenstock wrote:
> On Thu, 2007-12-20 at 13:42 +0200, Yevgeny Kliteynik wrote:
>> Hal Rosenstock wrote:
>>> On Wed, 2007-12-19 at 11:58 -0800, akepner at sgi.com wrote:
>>>> We're seeing a regression in smpquery from alpha2 to rc1. 
>>>>
>>>> For example, with alpha2 I get:
>>>> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
>>>> # Node info: Lid 3
>>>> BaseVers:........................1
>>>> ClassVers:.......................1
>>>> NodeType:........................Channel Adapter
>>>> NumPorts:........................2
>>>> SystemGuid:......................0x00066a009800737c
>>>> Guid:............................0x00066a009800737c
>>>> PortGuid:........................0x00066a01a000737c
>>>> PartCap:.........................64
>>>> DevId:...........................0x6278
>>>> Revision:........................0x000000a0
>>>> LocalPort:.......................2
>>>> VendorId:........................0x00066a
>>>> grommit:~ # 
>>>>
>>>>
>>>> And with rc1, I get:
>>>> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
>>>> ibwarn: [5650] ib_path_query: sa call path_query failed
>>>> smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c
>>>> grommit:~ #  
>>>>
>>>> But using a LID works fine:
>>>> grommit:~ # smpquery nodeinfo 3
>>>> # Node info: Lid 3
>>>> BaseVers:........................1
>>>> ClassVers:.......................1
>>>> NodeType:........................Channel Adapter
>>>> NumPorts:........................2
>>>> SystemGuid:......................0x00066a009800737c
>>>> Guid:............................0x00066a009800737c
>>>> PortGuid:........................0x00066a01a000737c
>>>> PartCap:.........................64
>>>> DevId:...........................0x6278
>>>> Revision:........................0x000000a0
>>>> LocalPort:.......................2
>>>> VendorId:........................0x00066a
>>>> grommit:~ # 
>>>>
>>>> Strangest of all, running it under strace also works:
>>>> grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c > /tmp/smpquery.out 
>>>> .....
>>>> grommit:~ # cat /tmp/smpquery.out
>>>> # Node info: Lid 3
>>>> BaseVers:........................1
>>>> ClassVers:.......................1
>>>> NodeType:........................Channel Adapter
>>>> NumPorts:........................2
>>>> SystemGuid:......................0x00066a009800737c
>>>> Guid:............................0x00066a009800737c
>>>> PortGuid:........................0x00066a01a000737c
>>>> PartCap:.........................64
>>>> DevId:...........................0x6278
>>>> Revision:........................0x000000a0
>>>> LocalPort:.......................2
>>>> VendorId:........................0x00066a
>>>> grommit:~ #
>>>>
>>>> Some weird race condition...
>>>>
>>>> Anyone else seeing the same?
>>> -G requires a SA path record lookup so this could be an issue with that
>>> timing out in some cases (assuming the port is active and the SM is
>>> operational).
>> I'm seeing the same problem.
>> Sometimes the query works, and sometimes it doesn't.
>> I also see that when the query fails, OpenSM doesn't get PathRecord query at all.
>>
>> Hal, can you elaborate on "that timing out in some cases" issue?
> 
> I just meant that the SM not responding (for an unknown reason right
> now) would yield this effect.
> 
>> Adding Jack for the libibmad issue:
>>
>> I see that the ib_path_query() in libibmad/sa.c sometimes fails
>> when calling safe_sa_call().
> 
> This could just be more detail on the same thing in terms of the
> (smpquery) client which is layered on top of libibmad: the SA path query
> timeout.
> I would suggest running OpenSM in verbose mode (both instances are with
> OpenSM) and seeing if it responds to the PathRecord query used by this
> form of smpquery and continue troubleshooting from there based on the
> result.

This is actually what I was saying here.
I have *debugged* smpquery, and saw that the failing function is
ib_path_query() in libibmad/sa.c
As I've mentioned, I did run it with OpenSM in verbose mode, and saw
that when smpquery fails, OpenSM log does not have any PathRecord request.
When smpquery passes, I see the PathRecord request and response in the
OpenSM log.

-- Yevgeny

> -- Hal
> 
>> -- Yevgeny
>>
>>> -- Hal
>>> _______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>
> 




More information about the general mailing list