[openib-general] opensm fails to bring up subnet..

Troy Benjegerdes hozer at hozed.org
Fri Jun 3 10:17:21 PDT 2005


On Fri, Jun 03, 2005 at 08:37:04AM -0400, Hal Rosenstock wrote:
> On Thu, 2005-06-02 at 19:23, Troy Benjegerdes wrote: 
> > I'm having intermittent problems with opensm.. It seems after a while
> > IPoIB stops working
> 
> Wonder if there is some relation to the two: intermittent IPoIB and lack
> of response to SM query.
> 
> >  and if I restart opensm,
> 
> How did you get around the ABI version mismatch issue ?
> 
> >  it starts spitting out
> > errors. Do I have a misbehaving switch somewhere?
> 
> It appears that a node is not responding to a discovery packet (SM Get
> NodeInfo (attrID 0x11)). It's direct route initial path (an array of
> port numbers at the start of the next hop) is:
> Initial path = [1][81][1] which means that starting at the node running
> OpenSM, port 1 then port 129 then port 1. Is there a large switch in the
> middle ? Can you send the output of ibnetdiscover ? If that is valid,
> which HCA (port) is not responding (what is the GUID) ? 
> 
> Unfortunately on such an error osm does not appear to give up  (it
> retries forever and is locked on such a node). This is obviously not
> good.
> 
> > ibnetdiscover seems to work fine.
> 
> Are you sure it displays all HCA and switches and their ports ? I
> wouldn't think it would respond to ibnetdiscover if it didn't respond to
> osm. 

I'm running a subversion checkout as of yesterday, so that's how I
got around the ABI version stuff.

the [81] port indicator is definitely bogus. All I have are 8 port
switches. I've also seen [0][0][0] path indicators.. are those allowed
as well?



More information about the general mailing list