[openib-general] opensm fails to bring up subnet..

Eitan Zahavi eitan at mellanox.co.il
Fri Jun 3 10:19:57 PDT 2005


So Troy - will you be able to capture an osm.log and send us a tar.gz ?

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Troy Benjegerdes [mailto:hozer at hozed.org]
> Sent: Friday, June 03, 2005 8:17 PM
> To: Hal Rosenstock
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] opensm fails to bring up subnet..
> 
> On Fri, Jun 03, 2005 at 08:37:04AM -0400, Hal Rosenstock wrote:
> > On Thu, 2005-06-02 at 19:23, Troy Benjegerdes wrote:
> > > I'm having intermittent problems with opensm.. It seems after a while
> > > IPoIB stops working
> >
> > Wonder if there is some relation to the two: intermittent IPoIB and lack
> > of response to SM query.
> >
> > >  and if I restart opensm,
> >
> > How did you get around the ABI version mismatch issue ?
> >
> > >  it starts spitting out
> > > errors. Do I have a misbehaving switch somewhere?
> >
> > It appears that a node is not responding to a discovery packet (SM Get
> > NodeInfo (attrID 0x11)). It's direct route initial path (an array of
> > port numbers at the start of the next hop) is:
> > Initial path = [1][81][1] which means that starting at the node running
> > OpenSM, port 1 then port 129 then port 1. Is there a large switch in the
> > middle ? Can you send the output of ibnetdiscover ? If that is valid,
> > which HCA (port) is not responding (what is the GUID) ?
> >
> > Unfortunately on such an error osm does not appear to give up  (it
> > retries forever and is locked on such a node). This is obviously not
> > good.
> >
> > > ibnetdiscover seems to work fine.
> >
> > Are you sure it displays all HCA and switches and their ports ? I
> > wouldn't think it would respond to ibnetdiscover if it didn't respond to
> > osm.
> 
> I'm running a subversion checkout as of yesterday, so that's how I
> got around the ABI version stuff.
> 
> the [81] port indicator is definitely bogus. All I have are 8 port
> switches. I've also seen [0][0][0] path indicators.. are those allowed
> as well?
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050603/182c9cd6/attachment.html>


More information about the general mailing list