[openib-general] opensm fails to bring up subnet..
Hal Rosenstock
halr at voltaire.com
Fri Jun 3 05:37:04 PDT 2005
On Thu, 2005-06-02 at 19:23, Troy Benjegerdes wrote:
> I'm having intermittent problems with opensm.. It seems after a while
> IPoIB stops working
Wonder if there is some relation to the two: intermittent IPoIB and lack
of response to SM query.
> and if I restart opensm,
How did you get around the ABI version mismatch issue ?
> it starts spitting out
> errors. Do I have a misbehaving switch somewhere?
It appears that a node is not responding to a discovery packet (SM Get
NodeInfo (attrID 0x11)). It's direct route initial path (an array of
port numbers at the start of the next hop) is:
Initial path = [1][81][1] which means that starting at the node running
OpenSM, port 1 then port 129 then port 1. Is there a large switch in the
middle ? Can you send the output of ibnetdiscover ? If that is valid,
which HCA (port) is not responding (what is the GUID) ?
Unfortunately on such an error osm does not appear to give up (it
retries forever and is locked on such a node). This is obviously not
good.
> ibnetdiscover seems to work fine.
Are you sure it displays all HCA and switches and their ports ? I
wouldn't think it would respond to ibnetdiscover if it didn't respond to
osm.
-- Hal
> (this is from running 'opensm -v -o -r')
More information about the general
mailing list