[openib-general] IPoIB broadcast MC group membership

Fabian Tillier ftillier at silverstorm.com
Mon Feb 20 22:10:30 PST 2006


On 2/20/06, Roland Dreier <rdreier at cisco.com> wrote:
>    Fabian> What is the behavior of SMs that pre-create the group in
>    Fabian> response to a GET query for the MC group parameters?  Does
>    Fabian> the query return a record, or does it fail with no
>    Fabian> records?
>
> I guess it depends on the SM.

I guess the follow up question is how an SM handles a MC join that
specifies the QKey and other settings that may conflict with the
preset ones, but didn't return a response to the GET query.  Does it
fail the join, does it succeed it with the provided values, or does it
succeed it and return the preset parameters?

OpenSM seems to respond to the GET query, even if there are no members
to the group - and the query returns a group that specifies a rate of
10Gbps (4X SDR - same as the system running OpenSM, incidentally)

> Do you know of an SM that has problems with the existing Linux IPoIB driver?

No, actually my query wasn't driven by an actual issue with the Linux
IPoIB driver.  I was trying to figure out how to do better error
handling and diagnostic logging in the Windows IPoIB and wanted to see
how the Linux IPoIB driver handled a similar situation.

The problem I was having was that the sequence of events I was using
in the Windows IPoIB resulted in ambiguous error conditions, where it
wasn't possible to differentiate between unexpected errors and errors
that could be worked around.

The Windows IPoIB follows a sequence like:

if( GET broadcast group == NO_ERROR )
    if( SET join broadcast group != NO_ERROR ) repeat GET;
else
    if( SET create broadcast group != NO_ERROR ) repeat GET;

Specifically, the problem relates to handling a 1X node trying to join
the broadcast group, and what the retry policy should be.  If the
group already exists at 4X, the join should fail if the SM follows the
compliance statements in the IB spec.  Because the code allowed for
the broadcast group not pre-existing (that is, a join could fail
because the group wasn't created), it was unclear whether a failure of
the join indicated that there was a setting incompatibility (1X vs.
4X), or just whether the group needed to be created.  Then, because
the code handled the race where some other node beat it to creation
and thus resulted in invalid settings, a failure in creation resulted
in a retry of the whole process, staring with a new GET query.

A 1X node in such a case ends up perpetually retrying the sequence of
events, eventhough it really should just stop and wait for the next
port up event (since link width changes require the port to go through
the down state as far as I understand).

The lack of detailed error reporting in SA queries could stand to be
improved, and something as simple as the SA returning a component mask
indicating which components caused conflicts would be extremely useful
in determining the next course of action.  ERR_REQ_INVALID is just too
broad in this case to allow the code to do anything intelligent.

As a note, OpenSM seems to allow a 1X node to join a 4X multicast
group which it should not, unless the join specifies the rate in which
case the join fails as expected.  Do we just not care that a 1X node
could be dropping 3/4 of the packets sent on the broadcast group,
aside from OpenSM violating o15-0.1.13?  Note that the failure if the
rate is specified occurs even if the 1X node is the first to attempt
to join (that is, no other nodes on the fabric have IPoIB running).

Anyhow, I'm still not sure how to cleanly handle these errors so that
the system log is pretty clear that things are not working likely due
to a bad cable.

- Fab



More information about the general mailing list