[openib-general] opensm fails to bring up subnet..
Troy Benjegerdes
hozer at hozed.org
Fri Jun 3 10:17:21 PDT 2005
On Fri, Jun 03, 2005 at 08:37:04AM -0400, Hal Rosenstock wrote:
> On Thu, 2005-06-02 at 19:23, Troy Benjegerdes wrote:
> > I'm having intermittent problems with opensm.. It seems after a while
> > IPoIB stops working
>
> Wonder if there is some relation to the two: intermittent IPoIB and lack
> of response to SM query.
>
> > and if I restart opensm,
>
> How did you get around the ABI version mismatch issue ?
>
> > it starts spitting out
> > errors. Do I have a misbehaving switch somewhere?
>
> It appears that a node is not responding to a discovery packet (SM Get
> NodeInfo (attrID 0x11)). It's direct route initial path (an array of
> port numbers at the start of the next hop) is:
> Initial path = [1][81][1] which means that starting at the node running
> OpenSM, port 1 then port 129 then port 1. Is there a large switch in the
> middle ? Can you send the output of ibnetdiscover ? If that is valid,
> which HCA (port) is not responding (what is the GUID) ?
>
> Unfortunately on such an error osm does not appear to give up (it
> retries forever and is locked on such a node). This is obviously not
> good.
>
> > ibnetdiscover seems to work fine.
>
> Are you sure it displays all HCA and switches and their ports ? I
> wouldn't think it would respond to ibnetdiscover if it didn't respond to
> osm.
I'm running a subversion checkout as of yesterday, so that's how I
got around the ABI version stuff.
the [81] port indicator is definitely bogus. All I have are 8 port
switches. I've also seen [0][0][0] path indicators.. are those allowed
as well?
More information about the general
mailing list