[openib-general] Re: OpenSM 1.8.0 Merge Status and Operational Issue

Hal Rosenstock halr at voltaire.com
Mon Sep 5 08:39:19 PDT 2005


Hi Eitan,

On Mon, 2005-09-05 at 11:16, Eitan Zahavi wrote:
> We were looking at the section below and were no sure we understand.
> Is your setup such that a 4x HCA is connected with a 4x -> 4* 1x "split" cable
> to a 1x analyzer and then on the other side using a similar cable back to a 4x HCA ?

Yes.

> In that case which 1x plug out of the 4 are you using to connect to the Analyzer?

The correct one. It works with the pre 1.8.0 OpenSM.

> Also we were looking at the differences between the pre 1.8.0 and the latest code and
> we conclude that there is a "missing feature" in the 1.8.0 code:

OK.

> OpenSM (from version 1.7.0 or so) does not do multiple sweeps in case of error - unless
> it has the "errors during initialization" flag set. This prevents repetitive sweeps if a
> port is not responding...

The port responded; just not with Active (It responded with Init rather
than Active).

> The missing feature - not to say bug - is in the case of SetResp (yes we are able to
> differentiate it from GetResp) that is responded by the target SMA with status != 0 the
> current code does not declare the fabric initialization as erroneous.

The status is 0x1c which says that the HCA is responding with status 7
to the Set PortInfo. Not sure why this is. (That comes from the
firmware). That's the first level problem.

> So in our example - if for some reason the 1x configuration requires a link reset due to the
> 4x -> 1x and the link respond with set error the new OpenSM misses the Set failure and thus
> fail to bring the link up.

> However, it is not clear to em why a second sweep is required in the first place.

Right but only trying once doesn't seem right to me as it never recovers
automatically although perhaps it could. This seems like the second
level problem to me which should be solved first.

> Yael will be working on it. It might take us some time as we need special setup for
> reproducing the problem.
> 
> Meanwhile I think Yale will provide a simple patch for considering the SetResp error
> as a valid cause for "errors during initialization".

OK. Thanks.

-- Hal




More information about the general mailing list