[openib-general] IPoIB broadcast MC group membership

Hal Rosenstock halr at voltaire.com
Tue Feb 21 06:42:10 PST 2006


Hi Fab,

On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote:
> On 2/20/06, Roland Dreier <rdreier at cisco.com> wrote:
> >    Fabian> What is the behavior of SMs that pre-create the group in
> >    Fabian> response to a GET query for the MC group parameters?  Does
> >    Fabian> the query return a record, or does it fail with no
> >    Fabian> records?
> >
> > I guess it depends on the SM.

I'm not sure about this. If the group is precreated, it exists and I
think a record needs to be returned.

> I guess the follow up question is how an SM handles a MC join that
> specifies the QKey and other settings that may conflict with the
> preset ones, but didn't return a response to the GET query.  Does it
> fail the join, does it succeed it with the provided values, or does it
> succeed it and return the preset parameters?

If the join is to the same MC group and the settings conflict, it should
fail.

> OpenSM seems to respond to the GET query, even if there are no members
> to the group 

In IBA, there are no groups without members so a precreated group has a
member (albeit an admin member).

> - and the query returns a group that specifies a rate of
> 10Gbps (4X SDR - same as the system running OpenSM, incidentally)

Note also that this will be changing with the partition manager work
going on.

> > Do you know of an SM that has problems with the existing Linux IPoIB driver?
> 
> No, actually my query wasn't driven by an actual issue with the Linux
> IPoIB driver.  I was trying to figure out how to do better error
> handling and diagnostic logging in the Windows IPoIB and wanted to see
> how the Linux IPoIB driver handled a similar situation.
> 
> The problem I was having was that the sequence of events I was using
> in the Windows IPoIB resulted in ambiguous error conditions, where it
> wasn't possible to differentiate between unexpected errors and errors
> that could be worked around.
> 
> The Windows IPoIB follows a sequence like:
> 
> if( GET broadcast group == NO_ERROR )
>     if( SET join broadcast group != NO_ERROR ) repeat GET;
> else
>     if( SET create broadcast group != NO_ERROR ) repeat GET;
> 
> Specifically, the problem relates to handling a 1X node trying to join
> the broadcast group, and what the retry policy should be.  If the
> group already exists at 4X, the join should fail if the SM follows the
> compliance statements in the IB spec.

I think you would need to look at the returned error status. A
mismatched rate join should fail with unrealizable rather than invalid.
Does that help ?

>   Because the code allowed for
> the broadcast group not pre-existing (that is, a join could fail
> because the group wasn't created), it was unclear whether a failure of
> the join indicated that there was a setting incompatibility (1X vs.
> 4X), or just whether the group needed to be created.  Then, because
> the code handled the race where some other node beat it to creation
> and thus resulted in invalid settings, a failure in creation resulted
> in a retry of the whole process, staring with a new GET query.
> 
> A 1X node in such a case ends up perpetually retrying the sequence of
> events, eventhough it really should just stop and wait for the next
> port up event (since link width changes require the port to go through
> the down state as far as I understand).
> 
> The lack of detailed error reporting in SA queries could stand to be
> improved, and something as simple as the SA returning a component mask
> indicating which components caused conflicts would be extremely useful
> in determining the next course of action.  ERR_REQ_INVALID is just too
> broad in this case to allow the code to do anything intelligent.

I think ERR_REQ_UNREALIZABLE helps here.

> As a note, OpenSM seems to allow a 1X node to join a 4X multicast
> group which it should not, unless the join specifies the rate in which
> case the join fails as expected.

I believe this is a bug which should be fixed.

>   Do we just not care that a 1X node
> could be dropping 3/4 of the packets sent on the broadcast group,
> aside from OpenSM violating o15-0.1.13?  Note that the failure if the
> rate is specified occurs even if the 1X node is the first to attempt
> to join (that is, no other nodes on the fabric have IPoIB running).

It is a preconfiguration issue in terms of OpenSM. It will be allowing
different rates in the near term future.

> Anyhow, I'm still not sure how to cleanly handle these errors so that
> the system log is pretty clear that things are not working likely due
> to a bad cable.

Does the above help ?

-- Hal

> - Fab
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list