[ofa-general] smpquery regression in 1.3-rc1
Yevgeny Kliteynik
kliteyn at dev.mellanox.co.il
Thu Dec 20 03:42:40 PST 2007
Hal Rosenstock wrote:
> On Wed, 2007-12-19 at 11:58 -0800, akepner at sgi.com wrote:
>> We're seeing a regression in smpquery from alpha2 to rc1.
>>
>> For example, with alpha2 I get:
>> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
>> # Node info: Lid 3
>> BaseVers:........................1
>> ClassVers:.......................1
>> NodeType:........................Channel Adapter
>> NumPorts:........................2
>> SystemGuid:......................0x00066a009800737c
>> Guid:............................0x00066a009800737c
>> PortGuid:........................0x00066a01a000737c
>> PartCap:.........................64
>> DevId:...........................0x6278
>> Revision:........................0x000000a0
>> LocalPort:.......................2
>> VendorId:........................0x00066a
>> grommit:~ #
>>
>>
>> And with rc1, I get:
>> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
>> ibwarn: [5650] ib_path_query: sa call path_query failed
>> smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c
>> grommit:~ #
>>
>> But using a LID works fine:
>> grommit:~ # smpquery nodeinfo 3
>> # Node info: Lid 3
>> BaseVers:........................1
>> ClassVers:.......................1
>> NodeType:........................Channel Adapter
>> NumPorts:........................2
>> SystemGuid:......................0x00066a009800737c
>> Guid:............................0x00066a009800737c
>> PortGuid:........................0x00066a01a000737c
>> PartCap:.........................64
>> DevId:...........................0x6278
>> Revision:........................0x000000a0
>> LocalPort:.......................2
>> VendorId:........................0x00066a
>> grommit:~ #
>>
>> Strangest of all, running it under strace also works:
>> grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c > /tmp/smpquery.out
>> .....
>> grommit:~ # cat /tmp/smpquery.out
>> # Node info: Lid 3
>> BaseVers:........................1
>> ClassVers:.......................1
>> NodeType:........................Channel Adapter
>> NumPorts:........................2
>> SystemGuid:......................0x00066a009800737c
>> Guid:............................0x00066a009800737c
>> PortGuid:........................0x00066a01a000737c
>> PartCap:.........................64
>> DevId:...........................0x6278
>> Revision:........................0x000000a0
>> LocalPort:.......................2
>> VendorId:........................0x00066a
>> grommit:~ #
>>
>> Some weird race condition...
>>
>> Anyone else seeing the same?
>
> -G requires a SA path record lookup so this could be an issue with that
> timing out in some cases (assuming the port is active and the SM is
> operational).
I'm seeing the same problem.
Sometimes the query works, and sometimes it doesn't.
I also see that when the query fails, OpenSM doesn't get PathRecord query at all.
Hal, can you elaborate on "that timing out in some cases" issue?
Adding Jack for the libibmad issue:
I see that the ib_path_query() in libibmad/sa.c sometimes fails
when calling safe_sa_call().
-- Yevgeny
> -- Hal
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list