[ofa-general] Re: Issues with combined routing in smpquery

Ira Weiny weiny2 at llnl.gov
Tue Apr 28 20:55:25 PDT 2009


On Tue, 28 Apr 2009 20:27:36 -0700
Ira Weiny <weiny2 at llnl.gov> wrote:

> Sasha, Hal,
> 
> I have some hardware on which the following query does not work.
> 
>    18:40:54 > ./smpquery -c nodeinfo 243 0,1
>    ibwarn: [22072] mad_rpc: _do_madrpc failed; dport (Lid 243 DR path slid 148; dlid 65535; 0,1)
>    ./smpquery: iberror: failed: operation nodeinfo: node info query failed
> 
> from the node I am running on.
> 
>    20:08:46 > ibstat
>    CA 'mlx4_0'
>         CA type: MT25418
>         Number of ports: 2
>         Firmware version: 2.6.0
>         Hardware version: a0
>         Node GUID: 0x0002c9020025feb4
>         System image GUID: 0x0002c9020025feb7
>         Port 1:
>                   State: Active
>                   Physical state: LinkUp
>                   Rate: 10
>                   Base lid: 148
>                   LMC: 2
>                   SM lid: 148
>                   Capability mask: 0x0251086a
>                   Port GUID: 0x0002c9020025feb5
>    [snip]
> 
>    19:12:10 > hostname
>    hype137
> 
> 
> A query on the LID alone returns this.
> 
>    18:41:20 > ./smpquery nodeinfo 243 
>    # Node info: Lid 243
>    [snip]
>    NodeType:........................Switch
>    NumPorts:........................24
>    SystemGuid:......................0x0008f10400400e69
>    Guid:............................0x0008f10400400e69
>    PortGuid:........................0x0008f10400400e69
>    [snip]
> 
> And iblinkinfo is.
> 
>    18:41:26 > iblinkinfo.pl -S 0x0008f10400400e69
>    Switch 0x0008f10400400e69 ISR9288 Voltaire sFB-12D:
>       243    1[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     646   10[  ] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>    [snip]
> 
> 
> It looks like combined routing is not working at all except for this one
> query.  (LID 37 is the switch which is connected to the HCA I am running
> on.)
> 
>    18:53:18 > ./smpquery -c portinfo 37 0,1
>    # Port info: Lid 37 DR path slid 148; dlid 65535; 0,1 port 0
>    Mkey:............................0x0000000000000000
>    GidPrefix:.......................0xfe80000000000000
>    Lid:.............................148
>    SMLid:...........................148
>    [snip]
> 
> All other combined routing queries I try fail.  And even this one above is
> wrong.  It is returning the data on port 6 not 1.  Look at the output from the
> local switch.
> 
>    19:12:00 > iblinkinfo.pl -R -S 0x000b8cffff004663
>    Switch 0x000b8cffff004663 MT47396 Infiniscale-III Mellanox Technologies:
>       37    1[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     108    1[  ] "hype132" (  )
>       37    2[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     528    1[  ] "hype133" (  )
>       37    3[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     296    1[  ] "hype134" (  )
>       37    4[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>      92    1[  ] "hype135" (  )
>       37    5[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     144    1[  ] "hype136" (  )
> 
> This is what is connected to LID 148...
>       37    6[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     148    1[  ] "hype137" (  )
> 
>       37    7[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     540    1[  ] "hype138" (  )
>       37    8[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     212    1[  ] "hype139" (  )
>       37    9[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     532    1[  ] "hype140" (  )
>       37   10[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>      60    1[  ] "hype141" (  )
>       37   11[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     192    1[  ] "hype142" (  )
>       37   12[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     312    1[  ] "hype143" (  )
>       37   13[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     647   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   14[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     641   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   15[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     643   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   16[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     653   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   17[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     637   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   18[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     610   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   19[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     655   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   20[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     645   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   21[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     635   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   22[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     651   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   23[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     639   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
>       37   24[  ]  ==( 4X 2.5 Gbps Active /   LinkUp)==>     649   13[12] "ISR9288/ISR9096 Voltaire sLB-24D" (  )
> 
> Any idea what is going on?  These were all run with a smpquery built from the
> current master tree.
> 
> On my little test system this seems to work just fine...  But not on this
> system.  Did some older hardware not support combined DR routing?

Actually I take this back.  It seems an older version of smpquery works but
not this newer one.  So I don't think this is a hardware issue.  :-(

   20:54:47 > ./smpquery -c nodeinfo 14 0,10
   ibwarn: [21947] _do_madrpc: send failed; Invalid argument
   ibwarn: [21947] mad_rpc: _do_madrpc failed; dport (Lid 14 DR path slid 4; dlid 65535; 0,10)
   ./smpquery: iberror: failed: operation nodeinfo: node info query failed

   20:54:52 > ./smpquery -V
   ./smpquery BUILD VERSION: 1.5.1_76524e3_dirty Build date: Apr 28 2009 20:47:10

   20:54:55 > smpquery -c nodeinfo 14 0,10
   # Node info: Lid 14 DR path 0,10
   BaseVers:........................1
   ClassVers:.......................1
   NodeType:........................Switch
   NumPorts:........................24
   SystemGuid:......................0x0008f10400411b19
   Guid:............................0x0008f10400411b18
   PortGuid:........................0x0008f10400411b18
   PartCap:.........................8
   DevId:...........................0x5a30
   Revision:........................0x000001a1
   LocalPort:.......................24
   VendorId:........................0x0008f1

   20:54:59 > smpquery -V
   smpquery BUILD VERSION: 1.3.6 Build date: Oct 13 2008 12:20:42

Ira




More information about the general mailing list