[ofa-general] Combined DR path with empty DR path, what is the expected behavior?
Ira Weiny
weiny2 at llnl.gov
Tue Aug 25 17:55:43 PDT 2009
On Tue, 25 Aug 2009 19:15:19 -0400
Hal Rosenstock <hal.rosenstock at gmail.com> wrote:
> On 8/24/09, Ira Weiny <weiny2 at llnl.gov> wrote:
>
> > If I send a combined DR path with a start lid but an empty (0 length) DR
> > path.
>
>
> Hop Count 0 ?
Yes
>
>
> > What is the expected behavior?
>
>
> Not sure what you mean by expected here. Are you referring to expectation
> based on the spec ?
>
yes
>
> > I know this could be specified with LID routing, but I don't see anywhere
> > in
> > the specification which says this is an error.
>
>
> I don't think it should be an error (certainly not for the form you are
> using LID routed part followed by a DR part) but a null DR part is a little
> funny/odd.
Yea I know. It turns out that the new iblinkinfo issues queries like this
when it is removes recurses back from the last DR portion of the combined
route path. It only showed up as an error when using the -S <guid> option of
iblinkinfo with this new switch I have. Works fine with the old switches.
>
> > I do however seem to have 2
> > different implementations on 2 different switches. For example:
> >
> > I have Switch A (Lid 1) and Switch B (Lid 7). I attempt to query PortInfo
> > of
> > Port 1 of each switch using the LID followed by an empty DR path.
> >
> > 17:55:22 > ./smpquery -c portinfo 1 0 1
> > ibwarn: [21005] mad_rpc: _do_madrpc failed; dport (Lid 1)
> > ./smpquery: iberror: failed: operation portinfo: port info query failed
>
>
> Is this a timeout ?
yes
16:26:25 > ./smpquery -e -c portinfo 1 0 1
ibwarn: [27150] _do_madrpc: retry 1 (timeout 1000 ms)
ibwarn: [27150] _do_madrpc: retry 2 (timeout 1000 ms)
ibwarn: [27150] _do_madrpc: timeout after 3 retries, 3000 ms
ibwarn: [27150] mad_rpc: _do_madrpc failed; dport (Lid 1)
./smpquery: iberror: failed: operation portinfo: port info query failed
>
>
> > 17:55:31 > ./smpquery -c portinfo 7 0 1
> > # Port info: Lid 7 port 1
> > Mkey:............................0x0000000000000000
> > GidPrefix:.......................0x0000000000000000
> > ...
> > <normal output snipped>
> >
> > Detecting this special case in libibmad and turning the packet into a LID
> > routed one
>
>
> Ugh... Is this special case really needed ? I don't think the underlying
> issue is understood sufficiently yet.
Well I just did it to prove that what I was doing would work with a "simple"
lid routed packet. Like I said it might be that this portid which is being
specified to libibmad by libibnetdisc is not valid. If that is true then
libibnetdisc should detect when the DR path is empty and go back to LID routed
requests. That is a valid fix in my mind.
>
> > succeeds but I wonder if this is an error in the SMI?
>
>
> Switch SMI ? Is this a proprietary implementation ?
>
Yes I see the bug with 2 different vendors switches. One is managed and the
other is not. My "old" switches (3 different vendors) do not show this
behavior. (Just to be clear I now I have 5 switches in my 5 node cluster!
;-)
>
>
> > I also notice this is an error on the HCA I am running from (lid 2).
>
>
> Is this HCA node OpenIB based ?
yes
>
> 17:57:42 > ./smpquery -c portinfo 2 0 1
> > ibwarn: [21008] mad_rpc: _do_madrpc failed; dport (Lid 2)
> > ./smpquery: iberror: failed: operation portinfo: port info query failed
>
>
> Is this also a timeout ?
yes
>
> Also, does the result differ based on where you source these from matter
> (locally v. remotely)?
Same result local and remote.
>
>
>
> > Running with a simple DR path works,
>
>
> You're referring to the same DR path here that fails in the combined route
> examples above, right ?
>
No. the example below is a DR path with Hop Count == 0 but without the initial
LID routing.
>
> > I guess because this is the loopback case mentioned on page 805.
>
>
> Yes but that's the high level requirement rather than the SMI rules which
> make that work.
>
>
>
> > 17:58:16 > ./smpquery -D portinfo 0 1
> > # Port info: DR path slid 65535; dlid 65535; 0 port 1
> > Mkey:............................0x0000000000000000
> > GidPrefix:.......................0x2007000000000000
> > ...
> > <snip>
> >
> > It guess that the comment "Since each part may be empty, there are eight
> > combinations, although only four are really useful:" on line 36 Page 805
> > can
> > be interpreted to mean that only those 4 combinations need to be supported.
> > Is this true?
>
>
> Not all 4 combinations are supported/known to work. When this was added for
> ibportstate, the only combined routing form that was important was LID
> routed part followed by a DR part.
>
When you say "known to work" you mean implemented with the diags? Or known to
work in all hardware?
>
> > On the other hand I think strictly this should be supported.
>
>
> In an ideal world yes but are they all required or is it just the one form
> most heavily used ?
That is what I am unclear on. Does the spec require that all 8 combinations
are required to work? I don't see a specific compliance which says that and I
am not sure if C14-9 and C14-13 cover all 8 combinations.
>
> > Item 4 of C14-9
> > (line 24 page 810) requires the SMI to handle the packet if the HopPointer
> > equals HopCount +1, which it is in my case (HopCount == 0, HopPointer == 1)
>
>
> By handle, this means "The SMI *shall *output the packet on the port whose
> number is in the entry indexed by Hop Pointer in the Initial Path. If that
> port number is invalid, the SMI *shall *discard the SMP."
>
> Are you sure the Hop Pointer is 1 ? Where do you see this ?
No I was wrong. I think I read the wrong madeye packet as I see the packet
right before this one did have a hop pointer of 1. I Added some debug prints
to mad_encode to get the following output:
17:26:10 > ./smpquery -e -c portinfo 1 0 1
trid 2a0f0cb5; HopCount 0; HopPointer 0; slid 0; dlid 0; 0, drpath->cnt 0
trid 2a0f0cb6; HopCount 0; HopPointer 0; slid 0; dlid 0; 0, drpath->cnt 0
trid 2a0f0cb7; HopCount 0; HopPointer 0; slid 2; dlid 65535; 0, drpath->cnt 0
ibwarn: [27322] _do_madrpc: recv failed: Connection timed out
ibwarn: [27322] mad_rpc: _do_madrpc failed; dport (Lid 1)
./smpquery: iberror: failed: operation portinfo: port info query failed
madeye for these packets:
Aug 25 17:28:03 woprjr0 Madeye:recv SMP
Aug 25 17:28:03 woprjr0 MAD version....0x1
Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
Aug 25 17:28:03 woprjr0 Class version..0x1
Aug 25 17:28:03 woprjr0 Method.........0x81 (Get response)
Aug 25 17:28:03 woprjr0 Status.........0x8000
Aug 25 17:28:03 woprjr0 Hop pointer....0x1
Aug 25 17:28:03 woprjr0 Hop counter....0x0
Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb5
Aug 25 17:28:03 woprjr0 Attr ID........0x11 (node info)
Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
Aug 25 17:28:03 woprjr0 Mkey...........0x0
Aug 25 17:28:03 woprjr0 DR SLID........0xffff
Aug 25 17:28:03 woprjr0 DR DLID........0xffff
Aug 25 17:28:03 woprjr0 Madeye:sent SMP
Aug 25 17:28:03 woprjr0 MAD version....0x1
Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
Aug 25 17:28:03 woprjr0 Class version..0x1
Aug 25 17:28:03 woprjr0 Method.........0x1 (Get)
Aug 25 17:28:03 woprjr0 Status.........0x00
Aug 25 17:28:03 woprjr0 Hop pointer....0x1
Aug 25 17:28:03 woprjr0 Hop counter....0x0
Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb5
Aug 25 17:28:03 woprjr0 Attr ID........0x11 (node info)
Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
Aug 25 17:28:03 woprjr0 Mkey...........0x0
Aug 25 17:28:03 woprjr0 DR SLID........0xffff
Aug 25 17:28:03 woprjr0 DR DLID........0xffff
Aug 25 17:28:03 woprjr0 Madeye:recv SMP
Aug 25 17:28:03 woprjr0 MAD version....0x1
Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
Aug 25 17:28:03 woprjr0 Class version..0x1
Aug 25 17:28:03 woprjr0 Method.........0x81 (Get response)
Aug 25 17:28:03 woprjr0 Status.........0x8000
Aug 25 17:28:03 woprjr0 Hop pointer....0x1
Aug 25 17:28:03 woprjr0 Hop counter....0x0
Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb6
Aug 25 17:28:03 woprjr0 Attr ID........0x15 (port info)
Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
Aug 25 17:28:03 woprjr0 Mkey...........0x0
Aug 25 17:28:03 woprjr0 DR SLID........0xffff
Aug 25 17:28:03 woprjr0 DR DLID........0xffff
Aug 25 17:28:03 woprjr0 Madeye:sent SMP
Aug 25 17:28:03 woprjr0 MAD version....0x1
Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
Aug 25 17:28:03 woprjr0 Class version..0x1
Aug 25 17:28:03 woprjr0 Method.........0x1 (Get)
Aug 25 17:28:03 woprjr0 Status.........0x00
Aug 25 17:28:03 woprjr0 Hop pointer....0x1
Aug 25 17:28:03 woprjr0 Hop counter....0x0
Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb6
Aug 25 17:28:03 woprjr0 Attr ID........0x15 (port info)
Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
Aug 25 17:28:03 woprjr0 Mkey...........0x0
Aug 25 17:28:03 woprjr0 DR SLID........0xffff
Aug 25 17:28:03 woprjr0 DR DLID........0xffff
Aug 25 17:28:03 woprjr0 Madeye:sent SMP
Aug 25 17:28:03 woprjr0 MAD version....0x1
Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
Aug 25 17:28:03 woprjr0 Class version..0x1
Aug 25 17:28:03 woprjr0 Method.........0x1 (Get)
Aug 25 17:28:03 woprjr0 Status.........0x00
Aug 25 17:28:03 woprjr0 Hop pointer....0x0
Aug 25 17:28:03 woprjr0 Hop counter....0x0
Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb7
Aug 25 17:28:03 woprjr0 Attr ID........0x15 (port info)
Aug 25 17:28:03 woprjr0 Attr modifier..0x0001
Aug 25 17:28:03 woprjr0 Mkey...........0x0
Aug 25 17:28:03 woprjr0 DR SLID........0x02
Aug 25 17:28:03 woprjr0 DR DLID........0xffff
No response is shown for trid 0x1b9d2a0f0cb7...
As an aside I see the hop pointer is set to 1 at a lower level since
mad_encode does not do it.
So I guess the proper case for C14-9 would be "3) If Hop Pointer is equal to
Hop Count". (They are both 0.)
>
> If so, what's the initial path at this point (or more specifically index 1
> of the initial path) ? I think that needs to be port 0 (if a switch) but
> this is a little weird as I would think it should be handed to the SMA which
> is different cases in the spec.
Yes I think I was wrong on the case. But still wouldn't the SMI detect that
this is the end of the DRPath and simply hand it to the SMA.
>
>
> > Then after processing
>
>
> by the SMA and doing the required returning initialization
>
> the SMI should return the packet as specified in C14-13
> > item 3 on line 9 page 812.
>
>
> I'm not sure it would use this case in the case of an empty DR pafh on
> return.
Actually I think it will use this. C14-9 item 3) states "the Hop Pointer
shall be incremented by 1" Therefore when the response is handed back to the
SMI the Hop pointer will be 1 and the hop count 0. And the SMI uses the
DRSLID to send the packet back to the requester.
>
> Am I wrong? In the end it does not matter as I have to make the software
> > work
> > for all the hardware I have; so I will change the software.
>
>
> IMO it does matter as to where the problem lies (SMI or otherwise) and how
> the layers are comprised in the implementation.
Agreed. I am mainly confused because I have 2 different implementations of
this. My "old" switches seem to handle this case just fine. My "new"
switches do not. So I am really wondering what is going on.
Here is the above output for the same query which works with an "old" switch.
17:28:04 > ./smpquery -e -c portinfo 7 0 1
...
trid 1a4329de; HopCount 0; HopPointer 0; slid 2; dlid 65535; 0, drpath->cnt 0
...
Aug 25 17:46:40 woprjr0 Madeye:sent SMP
Aug 25 17:46:40 woprjr0 MAD version....0x1
Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP)
Aug 25 17:46:40 woprjr0 Class version..0x1
Aug 25 17:46:40 woprjr0 Method.........0x1 (Get)
Aug 25 17:46:40 woprjr0 Status.........0x00
Aug 25 17:46:40 woprjr0 Hop pointer....0x0
Aug 25 17:46:40 woprjr0 Hop counter....0x0
Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de
Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info)
Aug 25 17:46:40 woprjr0 Attr modifier..0x0001
Aug 25 17:46:40 woprjr0 Mkey...........0x0
Aug 25 17:46:40 woprjr0 DR SLID........0x02
Aug 25 17:46:40 woprjr0 DR DLID........0xffff
Aug 25 17:46:40 woprjr0 Madeye:recv SMP
Aug 25 17:46:40 woprjr0 MAD version....0x1
Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP)
Aug 25 17:46:40 woprjr0 Class version..0x1
Aug 25 17:46:40 woprjr0 Method.........0x81 (Get response)
Aug 25 17:46:40 woprjr0 Status.........0x8000
Aug 25 17:46:40 woprjr0 Hop pointer....0x0
Aug 25 17:46:40 woprjr0 Hop counter....0x0
Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de
Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info)
Aug 25 17:46:40 woprjr0 Attr modifier..0x0001
Aug 25 17:46:40 woprjr0 Mkey...........0x0
Aug 25 17:46:40 woprjr0 DR SLID........0x02
Aug 25 17:46:40 woprjr0 DR DLID........0xffff
Hop Pointer and Count are both 0 and things work just fine...
>
> However, I wonder
> > where exactly the spec falls on this, because I think it will influence
> > where
> > the fix resides. If the spec does not allow this then I think it is fine
> > to
> > have libibmad return an error since the user specified an invalid combined
> > DR
> > path. However, if this should be legal I think libibmad should work around
> > the bad hardware out there.
>
>
> Is it hardware or firmware that needs fixing ? I think it may depend on the
> specific workaround for this as to whether it is acceptable as it might harm
> something else or might violate the spec.
I agree, however, if the switch hardware needs fixing I fear it is too late
for the ones I have. Firmware might be upgradable although I have had issues
with un-managed switches in the past.
So where do we put the fix in software?
Ira
> -- Hal
>
>
> Thoughts?
> > Ira
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > weiny2 at llnl.gov
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://*openib.org/mailman/listinfo/openib-general
> >
>
--
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2 at llnl.gov
More information about the general
mailing list