[ofa-general] OpenSM "Dead end on path to LID" -- problem with updn, fixed with minhop

Nathan Dauchy Nathan.Dauchy at noaa.gov
Wed Jul 23 09:57:17 PDT 2008


Yevgeny Kliteynik wrote:
> Nathan Dauchy wrote:
>> We went ahead and tried both MAXSMPS=4 and MAXSMPS=1.  The symptoms did
>> not improve with all the nodes booted. :(
>>
>> For the record, here is exactly how opensm is running now:
>>
>> # ps uaxw | grep open
>> root     23112  1.0  0.1 288432 16732 ?        Sl   18:10   0:07
>> /opt/ofed/1.3.1/sbin/opensm -maxsmps 1 -t 600 -f /var/run/osm/osm.log -R
>> updn -g 0 --honor_guid2lid
> 
> Please try running opensm with a default routing (w/o '-R updn').
> Just trying to understand if this is a routing or discovery issue.


Yevgeny, thanks for your continued help!

I modified the /etc/ofa/opensm.conf file to include:
	UPDN="off"
And then restarted opensm.  Here is what it looks like now:

[root at wms4 ~]# ps uaxw | grep open
root     28360  0.9  0.1 280756 16724 pts/3    Sl   16:20   0:06
/opt/ofed/1.3.1/sbin/opensm -maxsmps 1 -t 600 -f /var/run/osm/osm.log -g
0 --honor_guid2lid

So far, the results look good!!!  We now have all hosts booted,
communicating over IB, and "ibdiagnet" finishes cleanly.

Given the network topology I described previously, how is "minhop"
expected to behave differently than "updn"?

Unless you recommend otherwise, I now hope to add back in the missing
features and make sure everything is still OK:
	LMC=2
	MAXSMPS=4
	TIMEOUT=600
	12X links on subtree C.


>
> Also, where does the opensm run?
> 

For the record, the SM host is connected to one of the "Subtree B" edge
switches.  Hardware is an 8-core Intel Xeon 5450 @ 3.00GHz with 16GB
RAM, and a single-port DDR "MT25204" HCA.  Software is CentOS-5.1,
linux-2.6.18-53.1.21.el5, and OFED-1.3.1.


Please let me know if there is additional diagnostic information I can
gather in order to create a test case for future improvements to the
"updn" routing engine.


Thanks again,
Nathan



More information about the general mailing list