[ofa-general] Combined DR path with empty DR path, what is the expected behavior?

Ira Weiny weiny2 at llnl.gov
Wed Aug 26 10:31:20 PDT 2009


On Wed, 26 Aug 2009 10:55:41 -0400
Hal Rosenstock <hal.rosenstock at gmail.com> wrote:

> On 8/25/09, Ira Weiny <weiny2 at llnl.gov> wrote:
> >
> > On Tue, 25 Aug 2009 19:15:19 -0400
> > Hal Rosenstock <hal.rosenstock at gmail.com> wrote:
> >
> > > On 8/24/09, Ira Weiny <weiny2 at llnl.gov> wrote:
> > >

[snip]

> > >
> > >
> > > Not all 4 combinations are supported/known to work. When this was added
> > for
> > > ibportstate, the only combined routing form that was important was LID
> > > routed part followed by a DR part.
> > >
> >
> > When you say "known to work" you mean implemented with the diags?  Or known
> > to
> > work in all hardware?
> 
> 
> The former with most hardware up to some time ago. Note there is no
> compliance testing of combined routing and heavy reliance on this makes some
> a little nervous.

Ok, Good to know.  With this, and the rest of your response, in mind I went
ahead and created a patch to libibnetdisc which will go back to LID routing
when the Hop Count is returned to 0.  Patch to follow.

> 
> >
> > > > On the other hand I think strictly this should be supported.
> > >
> > >
> > > In an ideal world yes but are they all required or is it just the one
> > form
> > > most heavily used ?
> >
> > That is what I am unclear on.  Does the spec require that all 8
> > combinations
> > are required to work?  I don't see a specific compliance which says that
> > and I
> > am not sure if C14-9 and C14-13 cover all 8 combinations.
> 
> 
> I don't think there's any compliance on this. It all appears to be
> informative text. Perhaps a shortcoming of the spec. So there's nothing
> definitive. It just says there are 8 combinations (2**3 as there are 3 parts
> with 2 possibilities in each part) and that only 4 are really useful.

Well I agree that only 4 are "useful".  It is just the algorithm which
libibnetdisc used which resulted in this "weird" case.

[snip]

> 
> >
> > > If so, what's the initial path at this point (or more specifically index
> > 1
> > > of the initial path) ? I think that needs to be port 0 (if a switch) but
> > > this is a little weird as I would think it should be handed to the SMA
> > which
> > > is different cases in the spec.
> >
> > Yes I think I was wrong on the case.  But still wouldn't the SMI detect
> > that
> > this is the end of the DRPath and simply hand it to the SMA.
> 
> 
> Yes, that's what should happen.

I am going to take this up with the switch vendors and see what their
interpretation is.  For the time being I think my patch will fix libibnetdisc
(iblinkinfo).

Thanks again!
Ira

> 
> >
> > >
> > > > Then after processing
> > >
> > >
> > > by the SMA and doing the required returning initialization
> > >
> > > the SMI should return the packet as specified in C14-13
> > > > item 3 on line 9 page 812.
> > >
> > >
> > > I'm not sure it would use this case in the case of an empty DR pafh on
> > > return.
> >
> > Actually I think it will use this.  C14-9 item 3) states "the Hop Pointer
> > shall be incremented by 1"  Therefore when the response is handed back to
> > the
> > SMI the Hop pointer will be 1 and the hop count 0.  And the SMI uses the
> > DRSLID to send the packet back to the requester.
> 
> 
> It goes up to the SMA and then when the response is to be made it goes
> through returning SMI initialization and handling.
> 
> -- Hal
> 
> >
> > > Am I wrong?  In the end it does not matter as I have to make the software
> > > > work
> > > > for all the hardware I have; so I will change the software.
> > >
> > >
> > > IMO it does matter as to where the problem lies (SMI or otherwise) and
> > how
> > > the layers are comprised in the implementation.
> >
> > Agreed.  I am mainly confused because I have 2 different implementations of
> > this.  My "old" switches seem to handle this case just fine.  My "new"
> > switches do not.  So I am really wondering what is going on.
> >
> > Here is the above output for the same query which works with an "old"
> > switch.
> >
> > 17:28:04 > ./smpquery -e -c portinfo 7 0 1
> > ...
> > trid 1a4329de; HopCount 0; HopPointer 0; slid 2; dlid 65535; 0, drpath->cnt
> > 0
> > ...
> >
> > Aug 25 17:46:40 woprjr0 Madeye:sent SMP
> > Aug 25 17:46:40 woprjr0 MAD version....0x1
> > Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP)
> > Aug 25 17:46:40 woprjr0 Class version..0x1
> > Aug 25 17:46:40 woprjr0 Method.........0x1 (Get)
> > Aug 25 17:46:40 woprjr0 Status.........0x00
> > Aug 25 17:46:40 woprjr0 Hop pointer....0x0
> > Aug 25 17:46:40 woprjr0 Hop counter....0x0
> > Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de
> > Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info)
> > Aug 25 17:46:40 woprjr0 Attr modifier..0x0001
> > Aug 25 17:46:40 woprjr0 Mkey...........0x0
> > Aug 25 17:46:40 woprjr0 DR SLID........0x02
> > Aug 25 17:46:40 woprjr0 DR DLID........0xffff
> > Aug 25 17:46:40 woprjr0 Madeye:recv SMP
> > Aug 25 17:46:40 woprjr0 MAD version....0x1
> > Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP)
> > Aug 25 17:46:40 woprjr0 Class version..0x1
> > Aug 25 17:46:40 woprjr0 Method.........0x81 (Get response)
> > Aug 25 17:46:40 woprjr0 Status.........0x8000
> > Aug 25 17:46:40 woprjr0 Hop pointer....0x0
> > Aug 25 17:46:40 woprjr0 Hop counter....0x0
> > Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de
> > Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info)
> > Aug 25 17:46:40 woprjr0 Attr modifier..0x0001
> > Aug 25 17:46:40 woprjr0 Mkey...........0x0
> > Aug 25 17:46:40 woprjr0 DR SLID........0x02
> > Aug 25 17:46:40 woprjr0 DR DLID........0xffff
> >
> > Hop Pointer and Count are both 0 and things work just fine...
> >
> > >
> > > However, I wonder
> > > > where exactly the spec falls on this, because I think it will influence
> > > > where
> > > > the fix resides.  If the spec does not allow this then I think it is
> > fine
> > > > to
> > > > have libibmad return an error since the user specified an invalid
> > combined
> > > > DR
> > > > path.  However, if this should be legal I think libibmad should work
> > around
> > > > the bad hardware out there.
> > >
> > >
> > > Is it hardware or firmware that needs fixing ? I think it may depend on
> > the
> > > specific workaround for this as to whether it is acceptable as it might
> > harm
> > > something else or might violate the spec.
> >
> > I agree, however, if the switch hardware needs fixing I fear it is too late
> > for the ones I have.  Firmware might be upgradable although I have had
> > issues
> > with un-managed switches in the past.
> >
> > So where do we put the fix in software?
> >
> Ira
> >
> > > -- Hal
> > >
> > >
> > > Thoughts?
> > > > Ira
> > > >
> > > > --
> > > > Ira Weiny
> > > > Math Programmer/Computer Scientist
> > > > Lawrence Livermore National Lab
> > > > 925-423-8008
> > > > weiny2 at llnl.gov
> > > > _______________________________________________
> > > > general mailing list
> > > > general at lists.openfabrics.org
> > > > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > >
> > > > To unsubscribe, please visit
> > > > http://**openib.org/mailman/listinfo/openib-general
> > > >
> > >
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > weiny2 at llnl.gov
> >
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2 at llnl.gov



More information about the general mailing list