[ofa-general] OpenSM "Dead end on path to LID"

Nathan Dauchy Nathan.Dauchy at noaa.gov
Fri Jul 18 09:34:34 PDT 2008


Hi Yevgeny, thanks for your response,

Yevgeny Kliteynik wrote:
> Hi Nathan,
> 
> Nathan Dauchy wrote:
>>
>> Looking through osm.log a bit more, I also found a handful of errors
>> like these:
>>
>> Jul 17 01:31:29 345329 [46E0A940] 0x01 ->
>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for
>> node 0x000002c900000048(MT47396 Infiniscale-III Mellanox Technologies)
>> port 14. Adding to light sweep sampling list
>> Jul 17 01:31:29 345340 [46E0A940] 0x01 -> Directed Path Dump of 4 hop
>> path:
>>                                 Path = 0,1,20,7,15
>> Jul 17 01:31:29 345381 [46E0A940] 0x01 ->
>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for
>> node 0x000002c900000049(MT47396 Infiniscale-III Mellanox Technologies)
>> port 15. Adding to light sweep sampling list
>> Jul 17 01:31:29 345390 [46E0A940] 0x01 -> Directed Path Dump of 3 hop
>> path:
>>                                 Path = 0,1,22,11
>>
>> Does that indicate a problem as well?
> 
> This explains why ibdiagnet couldn't query port counters.
> OpenSM couldn't discover what's behind these ports, so it
> didn't configure routing tables for the undiscovered nodes.
> Ibdiagnet could discover them. It queries port counters by
> their LIDs, but switches don't have these LIDs in the
> routing tables.

Thanks, that makes sense.

>> Unknown remote side for node 0x000002c900000049(MT47396
>> Infiniscale-III Mellanox Technologies) port 15
> 
> What is the remote side of this port? HCA? Switch?
> If it's HCA, does its host run some heavy application?

The remote side of that port is a "spine" switch.  The remote side of
the other example error message is a "clos"/"edge" switch.

I guess I should provide some info on our IB network topology, since it
may be a little unique and contributing to the problem...

The Infiniband network consists of 3 layers of switches.  All switches
are 24-port Flextronics DDR switches (FX-X4300??).  We can refer to the
layers as "Edge" (clos), "Spine", and "Root" (aggregation).  The network
is divided into 3 "subtrees", joined by the (2) Root Aggregation
switches.  We can refer to the subtrees as A, B, and C.

Subtree A:
	22 Edge switches
	17 SDR Hosts per Edge switch
	6 Spine switches
	Each Edge switch has an uplink to each Spine
	Each Spine switch has an uplink to each Root

Subtree B:
	22 Edge switches
	12 DDR Hosts per Edge switch
	9 Spine switches
	Each Edge switch has an uplink to each Spine
	Each Spine switch has an uplink to each Root

Subtree C:
	4 Edge switches
	Edge switches are configured with 9 ports as 3 logical 12x links
	Up to 15 SDR/DDR Hosts per Edge switch
	3 Spine switches
	Spines are configured with all 24 ports as 8 logical 12x links
	Each Edge switch has an uplink (3 cables) to each Spine
	Each Spine switch has an uplink (3 cables) to each Root

Aggregation:
	2 Root switches
	Configured with 9 physical ports as 3 logical 12x ports
	6 links to Subtree A (each)
	9 links to Subtree B
	3 links (9 cables) to Subtree C

The Flextronics switches are 24-port DDR switches (FX-X4300??) using
Mellanox Part MTS2400 (Silicon MT47396).  They are are burned with
firmware version "fw-47396-1.0.0", using the "M24D0601A.INI" file, with
changes only to the "[LinkWidthSupp]" section.  We downloaded the
firmware from: http://www.mellanox.com/support/switch_firmware_table.php

So, the example "Unknown remote side" messages from above are:
	System B Edge -> System B Spine
	System A Spine -> System A Edge


> I understand you already increased transaction time.
> Please try limiting SMPs on the wire - in opensm.conf
> file, set max_wire_smps to 1 (you probably have 4).
> You can also run opensm with '-maxsmps 1' command line
> argument.

Interesting!

I believe MAXSMPS was originally set to 0 (unlimited), based on
duplicating the config file from an older SM setup.  I reduced it to 32
when we saw some IB errors on standalone System B.  I'm afraid I don't
have documentation on what those problems were, but I don't recall
seeing the exact same symptoms.  I think it was MAD timeout error
messages that prompted me to change the MAXSMPS value.


We will be able to test any fixes during a scheduled system downtime on
7/24.  At that point, do you recommend trying MAXSMPS=4?  (I assume the
tradeoff of number of SMPs is discovery speed vs. stability.  Yes?)
If that doesn't work, what else should we be prepared to try, or what
other debugging information would be helpful to gather?

For the record, other steps we are considering:
* Latest OpenSM code
* Upgrading firmware on all IB switches
* Changing topology to remove the 12X links (ugh!)


Thanks much,
Nathan



More information about the general mailing list