[openib-general] Re: OpenSM 1.8.0 Merge Status and Operational Issue

Eitan Zahavi eitan at mellanox.co.il
Mon Sep 5 08:16:39 PDT 2005


Hi Hal,

We were looking at the section below and were no sure we understand.
Is your setup such that a 4x HCA is connected with a 4x -> 4* 1x "split" cable
to a 1x analyzer and then on the other side using a similar cable back to a 4x HCA ?

In that case which 1x plug out of the 4 are you using to connect to the Analyzer?

Also we were looking at the differences between the pre 1.8.0 and the latest code and
we conclude that there is a "missing feature" in the 1.8.0 code:
OpenSM (from version 1.7.0 or so) does not do multiple sweeps in case of error - unless
it has the "errors during initialization" flag set. This prevents repetitive sweeps if a
port is not responding...
The missing feature - not to say bug - is in the case of SetResp (yes we are able to
differentiate it from GetResp) that is responded by the target SMA with status != 0 the
current code does not declare the fabric initialization as erroneous.

So in our example - if for some reason the 1x configuration requires a link reset due to the
4x -> 1x and the link respond with set error the new OpenSM misses the Set failure and thus
fail to bring the link up.

However, it is not clear to em why a second sweep is required in the first place.

Yael will be working on it. It might take us some time as we need special setup for
reproducing the problem.

Meanwhile I think Yale will provide a simple patch for considering the SetResp error
as a valid cause for "errors during initialization".

Eitan

Hal Rosenstock wrote:
> 
>>>I have a 4x HCA port (1x/4x LinkWidthEnable and Supported) connected via
>>>a 1x analyzer connected to a switch (so is 1x LinkWidthActive).
>>>OpenSM does not seem to want to bring this port up. It tries once and
>>>gives up until the physical link is cycled (cable pull and reinsertion).
>>>It does work running over a 4x link with 4x neighbor ports.




More information about the general mailing list