[ofa-general] Combined DR path with empty DR path, what is the expected behavior?
Hal Rosenstock
hal.rosenstock at gmail.com
Wed Aug 26 07:55:41 PDT 2009
On 8/25/09, Ira Weiny <weiny2 at llnl.gov> wrote:
>
> On Tue, 25 Aug 2009 19:15:19 -0400
> Hal Rosenstock <hal.rosenstock at gmail.com> wrote:
>
> > On 8/24/09, Ira Weiny <weiny2 at llnl.gov> wrote:
> >
> > > If I send a combined DR path with a start lid but an empty (0 length)
> DR
> > > path.
> >
> >
> > Hop Count 0 ?
>
> Yes
>
> >
> >
> > > What is the expected behavior?
> >
> >
> > Not sure what you mean by expected here. Are you referring to expectation
> > based on the spec ?
> >
>
> yes
>
> >
> > > I know this could be specified with LID routing, but I don't see
> anywhere
> > > in
> > > the specification which says this is an error.
> >
> >
> > I don't think it should be an error (certainly not for the form you are
> > using LID routed part followed by a DR part) but a null DR part is a
> little
> > funny/odd.
>
> Yea I know. It turns out that the new iblinkinfo issues queries like this
> when it is removes recurses back from the last DR portion of the combined
> route path. It only showed up as an error when using the -S <guid> option
> of
> iblinkinfo with this new switch I have. Works fine with the old switches.
>
> >
> > > I do however seem to have 2
> > > different implementations on 2 different switches. For example:
> > >
> > > I have Switch A (Lid 1) and Switch B (Lid 7). I attempt to query
> PortInfo
> > > of
> > > Port 1 of each switch using the LID followed by an empty DR path.
> > >
> > > 17:55:22 > ./smpquery -c portinfo 1 0 1
> > > ibwarn: [21005] mad_rpc: _do_madrpc failed; dport (Lid 1)
> > > ./smpquery: iberror: failed: operation portinfo: port info query failed
> >
> >
> > Is this a timeout ?
>
> yes
>
> 16:26:25 > ./smpquery -e -c portinfo 1 0 1
> ibwarn: [27150] _do_madrpc: retry 1 (timeout 1000 ms)
> ibwarn: [27150] _do_madrpc: retry 2 (timeout 1000 ms)
> ibwarn: [27150] _do_madrpc: timeout after 3 retries, 3000 ms
> ibwarn: [27150] mad_rpc: _do_madrpc failed; dport (Lid 1)
> ./smpquery: iberror: failed: operation portinfo: port info query failed
>
>
> >
> >
> > > 17:55:31 > ./smpquery -c portinfo 7 0 1
> > > # Port info: Lid 7 port 1
> > > Mkey:............................0x0000000000000000
> > > GidPrefix:.......................0x0000000000000000
> > > ...
> > > <normal output snipped>
> > >
> > > Detecting this special case in libibmad and turning the packet into a
> LID
> > > routed one
> >
> >
> > Ugh... Is this special case really needed ? I don't think the underlying
> > issue is understood sufficiently yet.
>
> Well I just did it to prove that what I was doing would work with a
> "simple"
> lid routed packet. Like I said it might be that this portid which is being
> specified to libibmad by libibnetdisc is not valid. If that is true then
> libibnetdisc should detect when the DR path is empty and go back to LID
> routed
> requests. That is a valid fix in my mind.
Sure; there's no real need for combined route when the DR path is empty but
it should work (at least with switches).
>
> > > succeeds but I wonder if this is an error in the SMI?
> >
> >
> > Switch SMI ? Is this a proprietary implementation ?
> >
>
> Yes I see the bug with 2 different vendors switches. One is managed and
> the
> other is not. My "old" switches (3 different vendors) do not show this
> behavior. (Just to be clear I now I have 5 switches in my 5 node cluster!
> ;-)
>
> >
> >
> > > I also notice this is an error on the HCA I am running from (lid 2).
> >
> >
> > Is this HCA node OpenIB based ?
>
> yes
If I recall correctly, there is something in the spec that makes combined
routing not be allowed on HCA (and router) ports so this seems correct. I
can dig this out if really needed.
>
> > 17:57:42 > ./smpquery -c portinfo 2 0 1
> > > ibwarn: [21008] mad_rpc: _do_madrpc failed; dport (Lid 2)
> > > ./smpquery: iberror: failed: operation portinfo: port info query failed
> >
> >
> > Is this also a timeout ?
>
> yes
>
> >
> > Also, does the result differ based on where you source these from matter
> > (locally v. remotely)?
>
> Same result local and remote.
>
> >
> >
> >
> > > Running with a simple DR path works,
> >
> >
> > You're referring to the same DR path here that fails in the combined
> route
> > examples above, right ?
> >
>
> No. the example below is a DR path with Hop Count == 0 but without the
> initial
> LID routing.
>
> >
> > > I guess because this is the loopback case mentioned on page 805.
> >
> >
> > Yes but that's the high level requirement rather than the SMI rules which
> > make that work.
> >
> >
> >
> > > 17:58:16 > ./smpquery -D portinfo 0 1
> > > # Port info: DR path slid 65535; dlid 65535; 0 port 1
> > > Mkey:............................0x0000000000000000
> > > GidPrefix:.......................0x2007000000000000
> > > ...
> > > <snip>
> > >
> > > It guess that the comment "Since each part may be empty, there are
> eight
> > > combinations, although only four are really useful:" on line 36 Page
> 805
> > > can
> > > be interpreted to mean that only those 4 combinations need to be
> supported.
> > > Is this true?
> >
> >
> > Not all 4 combinations are supported/known to work. When this was added
> for
> > ibportstate, the only combined routing form that was important was LID
> > routed part followed by a DR part.
> >
>
> When you say "known to work" you mean implemented with the diags? Or known
> to
> work in all hardware?
The former with most hardware up to some time ago. Note there is no
compliance testing of combined routing and heavy reliance on this makes some
a little nervous.
>
> > > On the other hand I think strictly this should be supported.
> >
> >
> > In an ideal world yes but are they all required or is it just the one
> form
> > most heavily used ?
>
> That is what I am unclear on. Does the spec require that all 8
> combinations
> are required to work? I don't see a specific compliance which says that
> and I
> am not sure if C14-9 and C14-13 cover all 8 combinations.
I don't think there's any compliance on this. It all appears to be
informative text. Perhaps a shortcoming of the spec. So there's nothing
definitive. It just says there are 8 combinations (2**3 as there are 3 parts
with 2 possibilities in each part) and that only 4 are really useful.
>
> > > Item 4 of C14-9
> > > (line 24 page 810) requires the SMI to handle the packet if the
> HopPointer
> > > equals HopCount +1, which it is in my case (HopCount == 0, HopPointer
> == 1)
> >
> >
> > By handle, this means "The SMI *shall *output the packet on the port
> whose
> > number is in the entry indexed by Hop Pointer in the Initial Path. If
> that
> > port number is invalid, the SMI *shall *discard the SMP."
> >
> > Are you sure the Hop Pointer is 1 ? Where do you see this ?
>
> No I was wrong. I think I read the wrong madeye packet as I see the packet
> right before this one did have a hop pointer of 1. I Added some debug
> prints
> to mad_encode to get the following output:
>
> 17:26:10 > ./smpquery -e -c portinfo 1 0 1
> trid 2a0f0cb5; HopCount 0; HopPointer 0; slid 0; dlid 0; 0, drpath->cnt 0
> trid 2a0f0cb6; HopCount 0; HopPointer 0; slid 0; dlid 0; 0, drpath->cnt 0
> trid 2a0f0cb7; HopCount 0; HopPointer 0; slid 2; dlid 65535; 0, drpath->cnt
> 0
> ibwarn: [27322] _do_madrpc: recv failed: Connection timed out
> ibwarn: [27322] mad_rpc: _do_madrpc failed; dport (Lid 1)
> ./smpquery: iberror: failed: operation portinfo: port info query failed
>
> madeye for these packets:
>
> Aug 25 17:28:03 woprjr0 Madeye:recv SMP
> Aug 25 17:28:03 woprjr0 MAD version....0x1
> Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
> Aug 25 17:28:03 woprjr0 Class version..0x1
> Aug 25 17:28:03 woprjr0 Method.........0x81 (Get response)
> Aug 25 17:28:03 woprjr0 Status.........0x8000
> Aug 25 17:28:03 woprjr0 Hop pointer....0x1
> Aug 25 17:28:03 woprjr0 Hop counter....0x0
> Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb5
> Aug 25 17:28:03 woprjr0 Attr ID........0x11 (node info)
> Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
> Aug 25 17:28:03 woprjr0 Mkey...........0x0
> Aug 25 17:28:03 woprjr0 DR SLID........0xffff
> Aug 25 17:28:03 woprjr0 DR DLID........0xffff
> Aug 25 17:28:03 woprjr0 Madeye:sent SMP
> Aug 25 17:28:03 woprjr0 MAD version....0x1
> Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
> Aug 25 17:28:03 woprjr0 Class version..0x1
> Aug 25 17:28:03 woprjr0 Method.........0x1 (Get)
> Aug 25 17:28:03 woprjr0 Status.........0x00
> Aug 25 17:28:03 woprjr0 Hop pointer....0x1
> Aug 25 17:28:03 woprjr0 Hop counter....0x0
> Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb5
> Aug 25 17:28:03 woprjr0 Attr ID........0x11 (node info)
> Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
> Aug 25 17:28:03 woprjr0 Mkey...........0x0
> Aug 25 17:28:03 woprjr0 DR SLID........0xffff
> Aug 25 17:28:03 woprjr0 DR DLID........0xffff
> Aug 25 17:28:03 woprjr0 Madeye:recv SMP
> Aug 25 17:28:03 woprjr0 MAD version....0x1
> Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
> Aug 25 17:28:03 woprjr0 Class version..0x1
> Aug 25 17:28:03 woprjr0 Method.........0x81 (Get response)
> Aug 25 17:28:03 woprjr0 Status.........0x8000
> Aug 25 17:28:03 woprjr0 Hop pointer....0x1
> Aug 25 17:28:03 woprjr0 Hop counter....0x0
> Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb6
> Aug 25 17:28:03 woprjr0 Attr ID........0x15 (port info)
> Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
> Aug 25 17:28:03 woprjr0 Mkey...........0x0
> Aug 25 17:28:03 woprjr0 DR SLID........0xffff
> Aug 25 17:28:03 woprjr0 DR DLID........0xffff
> Aug 25 17:28:03 woprjr0 Madeye:sent SMP
> Aug 25 17:28:03 woprjr0 MAD version....0x1
> Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
> Aug 25 17:28:03 woprjr0 Class version..0x1
> Aug 25 17:28:03 woprjr0 Method.........0x1 (Get)
> Aug 25 17:28:03 woprjr0 Status.........0x00
> Aug 25 17:28:03 woprjr0 Hop pointer....0x1
> Aug 25 17:28:03 woprjr0 Hop counter....0x0
> Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb6
> Aug 25 17:28:03 woprjr0 Attr ID........0x15 (port info)
> Aug 25 17:28:03 woprjr0 Attr modifier..0x0000
> Aug 25 17:28:03 woprjr0 Mkey...........0x0
> Aug 25 17:28:03 woprjr0 DR SLID........0xffff
> Aug 25 17:28:03 woprjr0 DR DLID........0xffff
> Aug 25 17:28:03 woprjr0 Madeye:sent SMP
> Aug 25 17:28:03 woprjr0 MAD version....0x1
> Aug 25 17:28:03 woprjr0 Class..........0x81 (Directed route SMP)
> Aug 25 17:28:03 woprjr0 Class version..0x1
> Aug 25 17:28:03 woprjr0 Method.........0x1 (Get)
> Aug 25 17:28:03 woprjr0 Status.........0x00
> Aug 25 17:28:03 woprjr0 Hop pointer....0x0
> Aug 25 17:28:03 woprjr0 Hop counter....0x0
> Aug 25 17:28:03 woprjr0 Trans ID.......0x1b9d2a0f0cb7
> Aug 25 17:28:03 woprjr0 Attr ID........0x15 (port info)
> Aug 25 17:28:03 woprjr0 Attr modifier..0x0001
> Aug 25 17:28:03 woprjr0 Mkey...........0x0
> Aug 25 17:28:03 woprjr0 DR SLID........0x02
> Aug 25 17:28:03 woprjr0 DR DLID........0xffff
>
> No response is shown for trid 0x1b9d2a0f0cb7...
>
> As an aside I see the hop pointer is set to 1 at a lower level since
> mad_encode does not do it.
Right; the SMI would do that.
So I guess the proper case for C14-9 would be "3) If Hop Pointer is equal to
> Hop Count". (They are both 0.)
I'm not sure; maybe C14-9 4)
>
> > If so, what's the initial path at this point (or more specifically index
> 1
> > of the initial path) ? I think that needs to be port 0 (if a switch) but
> > this is a little weird as I would think it should be handed to the SMA
> which
> > is different cases in the spec.
>
> Yes I think I was wrong on the case. But still wouldn't the SMI detect
> that
> this is the end of the DRPath and simply hand it to the SMA.
Yes, that's what should happen.
>
> >
> > > Then after processing
> >
> >
> > by the SMA and doing the required returning initialization
> >
> > the SMI should return the packet as specified in C14-13
> > > item 3 on line 9 page 812.
> >
> >
> > I'm not sure it would use this case in the case of an empty DR pafh on
> > return.
>
> Actually I think it will use this. C14-9 item 3) states "the Hop Pointer
> shall be incremented by 1" Therefore when the response is handed back to
> the
> SMI the Hop pointer will be 1 and the hop count 0. And the SMI uses the
> DRSLID to send the packet back to the requester.
It goes up to the SMA and then when the response is to be made it goes
through returning SMI initialization and handling.
-- Hal
>
> > Am I wrong? In the end it does not matter as I have to make the software
> > > work
> > > for all the hardware I have; so I will change the software.
> >
> >
> > IMO it does matter as to where the problem lies (SMI or otherwise) and
> how
> > the layers are comprised in the implementation.
>
> Agreed. I am mainly confused because I have 2 different implementations of
> this. My "old" switches seem to handle this case just fine. My "new"
> switches do not. So I am really wondering what is going on.
>
> Here is the above output for the same query which works with an "old"
> switch.
>
> 17:28:04 > ./smpquery -e -c portinfo 7 0 1
> ...
> trid 1a4329de; HopCount 0; HopPointer 0; slid 2; dlid 65535; 0, drpath->cnt
> 0
> ...
>
> Aug 25 17:46:40 woprjr0 Madeye:sent SMP
> Aug 25 17:46:40 woprjr0 MAD version....0x1
> Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP)
> Aug 25 17:46:40 woprjr0 Class version..0x1
> Aug 25 17:46:40 woprjr0 Method.........0x1 (Get)
> Aug 25 17:46:40 woprjr0 Status.........0x00
> Aug 25 17:46:40 woprjr0 Hop pointer....0x0
> Aug 25 17:46:40 woprjr0 Hop counter....0x0
> Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de
> Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info)
> Aug 25 17:46:40 woprjr0 Attr modifier..0x0001
> Aug 25 17:46:40 woprjr0 Mkey...........0x0
> Aug 25 17:46:40 woprjr0 DR SLID........0x02
> Aug 25 17:46:40 woprjr0 DR DLID........0xffff
> Aug 25 17:46:40 woprjr0 Madeye:recv SMP
> Aug 25 17:46:40 woprjr0 MAD version....0x1
> Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP)
> Aug 25 17:46:40 woprjr0 Class version..0x1
> Aug 25 17:46:40 woprjr0 Method.........0x81 (Get response)
> Aug 25 17:46:40 woprjr0 Status.........0x8000
> Aug 25 17:46:40 woprjr0 Hop pointer....0x0
> Aug 25 17:46:40 woprjr0 Hop counter....0x0
> Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de
> Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info)
> Aug 25 17:46:40 woprjr0 Attr modifier..0x0001
> Aug 25 17:46:40 woprjr0 Mkey...........0x0
> Aug 25 17:46:40 woprjr0 DR SLID........0x02
> Aug 25 17:46:40 woprjr0 DR DLID........0xffff
>
> Hop Pointer and Count are both 0 and things work just fine...
>
> >
> > However, I wonder
> > > where exactly the spec falls on this, because I think it will influence
> > > where
> > > the fix resides. If the spec does not allow this then I think it is
> fine
> > > to
> > > have libibmad return an error since the user specified an invalid
> combined
> > > DR
> > > path. However, if this should be legal I think libibmad should work
> around
> > > the bad hardware out there.
> >
> >
> > Is it hardware or firmware that needs fixing ? I think it may depend on
> the
> > specific workaround for this as to whether it is acceptable as it might
> harm
> > something else or might violate the spec.
>
> I agree, however, if the switch hardware needs fixing I fear it is too late
> for the ones I have. Firmware might be upgradable although I have had
> issues
> with un-managed switches in the past.
>
> So where do we put the fix in software?
>
Ira
>
> > -- Hal
> >
> >
> > Thoughts?
> > > Ira
> > >
> > > --
> > > Ira Weiny
> > > Math Programmer/Computer Scientist
> > > Lawrence Livermore National Lab
> > > 925-423-8008
> > > weiny2 at llnl.gov
> > > _______________________________________________
> > > general mailing list
> > > general at lists.openfabrics.org
> > > http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >
> > > To unsubscribe, please visit
> > > http://*openib.org/mailman/listinfo/openib-general
> > >
> >
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2 at llnl.gov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090826/05cbf37e/attachment.html>
More information about the general
mailing list