[openib-general] IPoIB broadcast MC group membership
Hal Rosenstock
halr at voltaire.com
Tue Feb 21 06:42:10 PST 2006
Hi Fab,
On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote:
> On 2/20/06, Roland Dreier <rdreier at cisco.com> wrote:
> > Fabian> What is the behavior of SMs that pre-create the group in
> > Fabian> response to a GET query for the MC group parameters? Does
> > Fabian> the query return a record, or does it fail with no
> > Fabian> records?
> >
> > I guess it depends on the SM.
I'm not sure about this. If the group is precreated, it exists and I
think a record needs to be returned.
> I guess the follow up question is how an SM handles a MC join that
> specifies the QKey and other settings that may conflict with the
> preset ones, but didn't return a response to the GET query. Does it
> fail the join, does it succeed it with the provided values, or does it
> succeed it and return the preset parameters?
If the join is to the same MC group and the settings conflict, it should
fail.
> OpenSM seems to respond to the GET query, even if there are no members
> to the group
In IBA, there are no groups without members so a precreated group has a
member (albeit an admin member).
> - and the query returns a group that specifies a rate of
> 10Gbps (4X SDR - same as the system running OpenSM, incidentally)
Note also that this will be changing with the partition manager work
going on.
> > Do you know of an SM that has problems with the existing Linux IPoIB driver?
>
> No, actually my query wasn't driven by an actual issue with the Linux
> IPoIB driver. I was trying to figure out how to do better error
> handling and diagnostic logging in the Windows IPoIB and wanted to see
> how the Linux IPoIB driver handled a similar situation.
>
> The problem I was having was that the sequence of events I was using
> in the Windows IPoIB resulted in ambiguous error conditions, where it
> wasn't possible to differentiate between unexpected errors and errors
> that could be worked around.
>
> The Windows IPoIB follows a sequence like:
>
> if( GET broadcast group == NO_ERROR )
> if( SET join broadcast group != NO_ERROR ) repeat GET;
> else
> if( SET create broadcast group != NO_ERROR ) repeat GET;
>
> Specifically, the problem relates to handling a 1X node trying to join
> the broadcast group, and what the retry policy should be. If the
> group already exists at 4X, the join should fail if the SM follows the
> compliance statements in the IB spec.
I think you would need to look at the returned error status. A
mismatched rate join should fail with unrealizable rather than invalid.
Does that help ?
> Because the code allowed for
> the broadcast group not pre-existing (that is, a join could fail
> because the group wasn't created), it was unclear whether a failure of
> the join indicated that there was a setting incompatibility (1X vs.
> 4X), or just whether the group needed to be created. Then, because
> the code handled the race where some other node beat it to creation
> and thus resulted in invalid settings, a failure in creation resulted
> in a retry of the whole process, staring with a new GET query.
>
> A 1X node in such a case ends up perpetually retrying the sequence of
> events, eventhough it really should just stop and wait for the next
> port up event (since link width changes require the port to go through
> the down state as far as I understand).
>
> The lack of detailed error reporting in SA queries could stand to be
> improved, and something as simple as the SA returning a component mask
> indicating which components caused conflicts would be extremely useful
> in determining the next course of action. ERR_REQ_INVALID is just too
> broad in this case to allow the code to do anything intelligent.
I think ERR_REQ_UNREALIZABLE helps here.
> As a note, OpenSM seems to allow a 1X node to join a 4X multicast
> group which it should not, unless the join specifies the rate in which
> case the join fails as expected.
I believe this is a bug which should be fixed.
> Do we just not care that a 1X node
> could be dropping 3/4 of the packets sent on the broadcast group,
> aside from OpenSM violating o15-0.1.13? Note that the failure if the
> rate is specified occurs even if the 1X node is the first to attempt
> to join (that is, no other nodes on the fabric have IPoIB running).
It is a preconfiguration issue in terms of OpenSM. It will be allowing
different rates in the near term future.
> Anyhow, I'm still not sure how to cleanly handle these errors so that
> the system log is pretty clear that things are not working likely due
> to a bad cable.
Does the above help ?
-- Hal
> - Fab
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list