[openib-general] opensm fails to bring up subnet..

Hal Rosenstock halr at voltaire.com
Fri Jun 3 05:37:04 PDT 2005


On Thu, 2005-06-02 at 19:23, Troy Benjegerdes wrote: 
> I'm having intermittent problems with opensm.. It seems after a while
> IPoIB stops working

Wonder if there is some relation to the two: intermittent IPoIB and lack
of response to SM query.

>  and if I restart opensm,

How did you get around the ABI version mismatch issue ?

>  it starts spitting out
> errors. Do I have a misbehaving switch somewhere?

It appears that a node is not responding to a discovery packet (SM Get
NodeInfo (attrID 0x11)). It's direct route initial path (an array of
port numbers at the start of the next hop) is:
Initial path = [1][81][1] which means that starting at the node running
OpenSM, port 1 then port 129 then port 1. Is there a large switch in the
middle ? Can you send the output of ibnetdiscover ? If that is valid,
which HCA (port) is not responding (what is the GUID) ? 

Unfortunately on such an error osm does not appear to give up  (it
retries forever and is locked on such a node). This is obviously not
good.

> ibnetdiscover seems to work fine.

Are you sure it displays all HCA and switches and their ports ? I
wouldn't think it would respond to ibnetdiscover if it didn't respond to
osm. 

-- Hal

> (this is from running 'opensm -v -o -r')





More information about the general mailing list