[ofa-general] OpenSM "Dead end on path to LID" -- problem with updn, fixed with minhop
Nathan Dauchy
Nathan.Dauchy at noaa.gov
Wed Jul 23 09:57:17 PDT 2008
Yevgeny Kliteynik wrote:
> Nathan Dauchy wrote:
>> We went ahead and tried both MAXSMPS=4 and MAXSMPS=1. The symptoms did
>> not improve with all the nodes booted. :(
>>
>> For the record, here is exactly how opensm is running now:
>>
>> # ps uaxw | grep open
>> root 23112 1.0 0.1 288432 16732 ? Sl 18:10 0:07
>> /opt/ofed/1.3.1/sbin/opensm -maxsmps 1 -t 600 -f /var/run/osm/osm.log -R
>> updn -g 0 --honor_guid2lid
>
> Please try running opensm with a default routing (w/o '-R updn').
> Just trying to understand if this is a routing or discovery issue.
Yevgeny, thanks for your continued help!
I modified the /etc/ofa/opensm.conf file to include:
UPDN="off"
And then restarted opensm. Here is what it looks like now:
[root at wms4 ~]# ps uaxw | grep open
root 28360 0.9 0.1 280756 16724 pts/3 Sl 16:20 0:06
/opt/ofed/1.3.1/sbin/opensm -maxsmps 1 -t 600 -f /var/run/osm/osm.log -g
0 --honor_guid2lid
So far, the results look good!!! We now have all hosts booted,
communicating over IB, and "ibdiagnet" finishes cleanly.
Given the network topology I described previously, how is "minhop"
expected to behave differently than "updn"?
Unless you recommend otherwise, I now hope to add back in the missing
features and make sure everything is still OK:
LMC=2
MAXSMPS=4
TIMEOUT=600
12X links on subtree C.
>
> Also, where does the opensm run?
>
For the record, the SM host is connected to one of the "Subtree B" edge
switches. Hardware is an 8-core Intel Xeon 5450 @ 3.00GHz with 16GB
RAM, and a single-port DDR "MT25204" HCA. Software is CentOS-5.1,
linux-2.6.18-53.1.21.el5, and OFED-1.3.1.
Please let me know if there is additional diagnostic information I can
gather in order to create a test case for future improvements to the
"updn" routing engine.
Thanks again,
Nathan
More information about the general
mailing list