[ofa-general] Combined DR path with empty DR path, what is the expected behavior?
Hal Rosenstock
hal.rosenstock at gmail.com
Tue Aug 25 16:15:19 PDT 2009
On 8/24/09, Ira Weiny <weiny2 at llnl.gov> wrote:
> If I send a combined DR path with a start lid but an empty (0 length) DR
> path.
Hop Count 0 ?
> What is the expected behavior?
Not sure what you mean by expected here. Are you referring to expectation
based on the spec ?
> I know this could be specified with LID routing, but I don't see anywhere
> in
> the specification which says this is an error.
I don't think it should be an error (certainly not for the form you are
using LID routed part followed by a DR part) but a null DR part is a little
funny/odd.
> I do however seem to have 2
> different implementations on 2 different switches. For example:
>
> I have Switch A (Lid 1) and Switch B (Lid 7). I attempt to query PortInfo
> of
> Port 1 of each switch using the LID followed by an empty DR path.
>
> 17:55:22 > ./smpquery -c portinfo 1 0 1
> ibwarn: [21005] mad_rpc: _do_madrpc failed; dport (Lid 1)
> ./smpquery: iberror: failed: operation portinfo: port info query failed
Is this a timeout ?
> 17:55:31 > ./smpquery -c portinfo 7 0 1
> # Port info: Lid 7 port 1
> Mkey:............................0x0000000000000000
> GidPrefix:.......................0x0000000000000000
> ...
> <normal output snipped>
>
> Detecting this special case in libibmad and turning the packet into a LID
> routed one
Ugh... Is this special case really needed ? I don't think the underlying
issue is understood sufficiently yet.
> succeeds but I wonder if this is an error in the SMI?
Switch SMI ? Is this a proprietary implementation ?
> I also notice this is an error on the HCA I am running from (lid 2).
Is this HCA node OpenIB based ?
17:57:42 > ./smpquery -c portinfo 2 0 1
> ibwarn: [21008] mad_rpc: _do_madrpc failed; dport (Lid 2)
> ./smpquery: iberror: failed: operation portinfo: port info query failed
Is this also a timeout ?
Also, does the result differ based on where you source these from matter
(locally v. remotely)?
> Running with a simple DR path works,
You're referring to the same DR path here that fails in the combined route
examples above, right ?
> I guess because this is the loopback case mentioned on page 805.
Yes but that's the high level requirement rather than the SMI rules which
make that work.
> 17:58:16 > ./smpquery -D portinfo 0 1
> # Port info: DR path slid 65535; dlid 65535; 0 port 1
> Mkey:............................0x0000000000000000
> GidPrefix:.......................0x2007000000000000
> ...
> <snip>
>
> It guess that the comment "Since each part may be empty, there are eight
> combinations, although only four are really useful:" on line 36 Page 805
> can
> be interpreted to mean that only those 4 combinations need to be supported.
> Is this true?
Not all 4 combinations are supported/known to work. When this was added for
ibportstate, the only combined routing form that was important was LID
routed part followed by a DR part.
> On the other hand I think strictly this should be supported.
In an ideal world yes but are they all required or is it just the one form
most heavily used ?
> Item 4 of C14-9
> (line 24 page 810) requires the SMI to handle the packet if the HopPointer
> equals HopCount +1, which it is in my case (HopCount == 0, HopPointer == 1)
By handle, this means "The SMI *shall *output the packet on the port whose
number is in the entry indexed by Hop Pointer in the Initial Path. If that
port number is invalid, the SMI *shall *discard the SMP."
Are you sure the Hop Pointer is 1 ? Where do you see this ?
If so, what's the initial path at this point (or more specifically index 1
of the initial path) ? I think that needs to be port 0 (if a switch) but
this is a little weird as I would think it should be handed to the SMA which
is different cases in the spec.
> Then after processing
by the SMA and doing the required returning initialization
the SMI should return the packet as specified in C14-13
> item 3 on line 9 page 812.
I'm not sure it would use this case in the case of an empty DR pafh on
return.
Am I wrong? In the end it does not matter as I have to make the software
> work
> for all the hardware I have; so I will change the software.
IMO it does matter as to where the problem lies (SMI or otherwise) and how
the layers are comprised in the implementation.
However, I wonder
> where exactly the spec falls on this, because I think it will influence
> where
> the fix resides. If the spec does not allow this then I think it is fine
> to
> have libibmad return an error since the user specified an invalid combined
> DR
> path. However, if this should be legal I think libibmad should work around
> the bad hardware out there.
Is it hardware or firmware that needs fixing ? I think it may depend on the
specific workaround for this as to whether it is acceptable as it might harm
something else or might violate the spec.
-- Hal
Thoughts?
> Ira
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2 at llnl.gov
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090825/72e2b7cb/attachment.html>
More information about the general
mailing list