[openib-general] opensm fails to bring up subnet..
Eitan Zahavi
eitan at mellanox.co.il
Fri Jun 3 09:47:38 PDT 2005
Hi,
Sorry for catching up with this late in the thread. (Thanks Hal for waking
me up...)
>
> It appears that a node is not responding to a discovery packet (SM Get
> NodeInfo (attrID 0x11)). It's direct route initial path (an array of
> port numbers at the start of the next hop) is:
> Initial path = [1][81][1] which means that starting at the node running
> OpenSM, port 1 then port 129 then port 1. Is there a large switch in the
> middle ? Can you send the output of ibnetdiscover ? If that is valid,
> which HCA (port) is not responding (what is the GUID) ?
[EZ] Normally all directed route dumps should start with:
Initial path = [0][....
The first hop is reserved to 0 - so I wonde if the above text is a direct
quote from the osm.log ?
The fact you got there a [81] means that the packet should leave from port
81 ??
I have never seen a switch with more then 24 ports...
> Unfortunately on such an error osm does not appear to give up (it
> retries forever and is locked on such a node). This is obviously not
> good.
Also Troy if you are able to capture the entire log it might put some light
on the issue of "OpenSM never give up" on such cases - which we want to
resolve.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050603/8af8ad07/attachment.html>
More information about the general
mailing list