[ofa-general] Re: Issues with combined routing in smpquery
Ira Weiny
weiny2 at llnl.gov
Tue Apr 28 20:55:25 PDT 2009
On Tue, 28 Apr 2009 20:27:36 -0700
Ira Weiny <weiny2 at llnl.gov> wrote:
> Sasha, Hal,
>
> I have some hardware on which the following query does not work.
>
> 18:40:54 > ./smpquery -c nodeinfo 243 0,1
> ibwarn: [22072] mad_rpc: _do_madrpc failed; dport (Lid 243 DR path slid 148; dlid 65535; 0,1)
> ./smpquery: iberror: failed: operation nodeinfo: node info query failed
>
> from the node I am running on.
>
> 20:08:46 > ibstat
> CA 'mlx4_0'
> CA type: MT25418
> Number of ports: 2
> Firmware version: 2.6.0
> Hardware version: a0
> Node GUID: 0x0002c9020025feb4
> System image GUID: 0x0002c9020025feb7
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 10
> Base lid: 148
> LMC: 2
> SM lid: 148
> Capability mask: 0x0251086a
> Port GUID: 0x0002c9020025feb5
> [snip]
>
> 19:12:10 > hostname
> hype137
>
>
> A query on the LID alone returns this.
>
> 18:41:20 > ./smpquery nodeinfo 243
> # Node info: Lid 243
> [snip]
> NodeType:........................Switch
> NumPorts:........................24
> SystemGuid:......................0x0008f10400400e69
> Guid:............................0x0008f10400400e69
> PortGuid:........................0x0008f10400400e69
> [snip]
>
> And iblinkinfo is.
>
> 18:41:26 > iblinkinfo.pl -S 0x0008f10400400e69
> Switch 0x0008f10400400e69 ISR9288 Voltaire sFB-12D:
> 243 1[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 646 10[ ] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> [snip]
>
>
> It looks like combined routing is not working at all except for this one
> query. (LID 37 is the switch which is connected to the HCA I am running
> on.)
>
> 18:53:18 > ./smpquery -c portinfo 37 0,1
> # Port info: Lid 37 DR path slid 148; dlid 65535; 0,1 port 0
> Mkey:............................0x0000000000000000
> GidPrefix:.......................0xfe80000000000000
> Lid:.............................148
> SMLid:...........................148
> [snip]
>
> All other combined routing queries I try fail. And even this one above is
> wrong. It is returning the data on port 6 not 1. Look at the output from the
> local switch.
>
> 19:12:00 > iblinkinfo.pl -R -S 0x000b8cffff004663
> Switch 0x000b8cffff004663 MT47396 Infiniscale-III Mellanox Technologies:
> 37 1[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 108 1[ ] "hype132" ( )
> 37 2[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 528 1[ ] "hype133" ( )
> 37 3[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 296 1[ ] "hype134" ( )
> 37 4[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 92 1[ ] "hype135" ( )
> 37 5[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 144 1[ ] "hype136" ( )
>
> This is what is connected to LID 148...
> 37 6[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 148 1[ ] "hype137" ( )
>
> 37 7[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 540 1[ ] "hype138" ( )
> 37 8[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 212 1[ ] "hype139" ( )
> 37 9[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 532 1[ ] "hype140" ( )
> 37 10[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 60 1[ ] "hype141" ( )
> 37 11[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 192 1[ ] "hype142" ( )
> 37 12[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 312 1[ ] "hype143" ( )
> 37 13[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 647 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 14[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 641 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 15[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 643 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 16[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 653 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 17[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 637 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 18[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 610 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 19[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 655 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 20[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 645 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 21[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 635 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 22[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 651 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 23[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 639 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
> 37 24[ ] ==( 4X 2.5 Gbps Active / LinkUp)==> 649 13[12] "ISR9288/ISR9096 Voltaire sLB-24D" ( )
>
> Any idea what is going on? These were all run with a smpquery built from the
> current master tree.
>
> On my little test system this seems to work just fine... But not on this
> system. Did some older hardware not support combined DR routing?
Actually I take this back. It seems an older version of smpquery works but
not this newer one. So I don't think this is a hardware issue. :-(
20:54:47 > ./smpquery -c nodeinfo 14 0,10
ibwarn: [21947] _do_madrpc: send failed; Invalid argument
ibwarn: [21947] mad_rpc: _do_madrpc failed; dport (Lid 14 DR path slid 4; dlid 65535; 0,10)
./smpquery: iberror: failed: operation nodeinfo: node info query failed
20:54:52 > ./smpquery -V
./smpquery BUILD VERSION: 1.5.1_76524e3_dirty Build date: Apr 28 2009 20:47:10
20:54:55 > smpquery -c nodeinfo 14 0,10
# Node info: Lid 14 DR path 0,10
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Switch
NumPorts:........................24
SystemGuid:......................0x0008f10400411b19
Guid:............................0x0008f10400411b18
PortGuid:........................0x0008f10400411b18
PartCap:.........................8
DevId:...........................0x5a30
Revision:........................0x000001a1
LocalPort:.......................24
VendorId:........................0x0008f1
20:54:59 > smpquery -V
smpquery BUILD VERSION: 1.3.6 Build date: Oct 13 2008 12:20:42
Ira
More information about the general
mailing list