[openib-general] opensm fails to bring up subnet..

Hal Rosenstock halr at voltaire.com
Fri Jun 3 10:06:36 PDT 2005


On Fri, 2005-06-03 at 12:47, Eitan Zahavi wrote:
> Hi, 
> Sorry for catching up with this late in the thread. (Thanks Hal for
> waking me up...)
> > 
> > It appears that a node is not responding to a discovery packet (SM
> Get
> > NodeInfo (attrID 0x11)). It's direct route initial path (an array of
> > port numbers at the start of the next hop) is:
> > Initial path = [1][81][1] which means that starting at the node
> running
> > OpenSM, port 1 then port 129 then port 1. Is there a large switch in
> the
> > middle ? Can you send the output of ibnetdiscover ? If that is
> valid,
> > which HCA (port) is not responding (what is the GUID) ?
> [EZ] Normally all directed route dumps should start with: 
> Initial path = [0][....
> The first hop is reserved to 0 - so I wonde if the above text is a
> direct quote from the osm.log ?
> The fact you got there a [81] means that the packet should leave from
> port 81 ?? 

81 being hex not decimal but it is still > 24.

> I have never seen a switch with more then 24 ports...

I thought that looked suspect. I didn't think there were any switch
chassis that were hiding their multiple internal switch chips.

> > Unfortunately on such an error osm does not appear to give up  (it
> > retries forever and is locked on such a node). This is obviously not
> > good.
> Also Troy if you are able to capture the entire log it might put some
> light on the issue of "OpenSM never give up" on such cases - which we
> want to resolve.

OpenIB has retries built into the MAD layer as well as the OpenIB vendor
layer doing some retries for a send which is supposed to be matched with
a response and this times out. [There is a potential issue here relative
to the VL15 counting on error which came up on the list a short while
ago so I am looking at possibly a change to this area of the vendor
layer but have not concluded my analysis of this yet.]

-- Hal




More information about the general mailing list