[ewg] OpenSM problem on today's OFED-1.5.1 daily build
Hal Rosenstock
hal.rosenstock at gmail.com
Fri Feb 19 16:08:54 PST 2010
On Fri, Feb 19, 2010 at 6:47 PM, Woodruff, Robert J
<robert.j.woodruff at intel.com> wrote:
> Hal wrote,
>
>>Has there been any change between those two in the management space ?
>
> I am not sure on that, but there must be some changes because it
> works with RC1 but fails with today's daily build.
Could it be changes to mlx driver ?
>>What state is the peer port in ? Any interesting OpenSM log messages ?
>
> The peer port on the other node is in the Iniaializing state.
And that's an SDR switch port ?
> Here is the tail of the opensm log file.
>
>
> Feb 19 15:44:23 734840 [1C05CA90] 0x80 -> Entering DISCOVERING state
> Feb 19 15:44:23 746070 [1C05CA90] 0x02 -> osm_vendor_bind: Binding to port 0x2c90300044fa9
> Feb 19 15:44:23 773455 [1C05CA90] 0x02 -> osm_vendor_bind: Binding to port 0x2c90300044fa9
> Feb 19 15:44:23 773501 [1C05CA90] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0x0002c90300044fa9
> Feb 19 15:44:24 574767 [41A72940] 0x01 -> umad_receiver: ERR 5411: DR SMP Send completed with error -- dropping
> Method 0x1, Attr 0x11, TID 0x140000123b, Hop Ptr: 0x0
> Feb 19 15:44:24 574798 [41A72940] 0x01 -> Received SMP on a 1 hop path: Initial path = 0,0, Return path = 0,0
> Feb 19 15:44:24 574811 [41A72940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT): SubnGet(NodeInfo), attr_mod 0x0, TID 0x123b
> Using default GUID 0x2c90300044fa9
> Entering MASTER state
>
> Feb 19 15:44:24 574879 [595F1940] 0x80 -> Entering MASTER state
> SUBNET UP
>
> Feb 19 15:44:24 576233 [595F1940] 0x80 -> SUBNET UP
> Feb 19 15:44:34 538093 [41A72940] 0x01 -> umad_receiver: ERR 5411: DR SMP Send completed with error -- dropping
> Method 0x1, Attr 0x11, TID 0x1400001240, Hop Ptr: 0x0
> Feb 19 15:44:34 538114 [41A72940] 0x01 -> Received SMP on a 1 hop path: Initial path = 0,0, Return path = 0,0
> Feb 19 15:44:34 538123 [41A72940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT): SubnGet(NodeInfo), attr_mod 0x0, TID 0x1240
> Feb 19 15:44:34 538853 [595F1940] 0x02 -> SUBNET UP
> Feb 19 15:44:44 541415 [41A72940] 0x01 -> umad_receiver: ERR 5411: DR SMP Send completed with error -- dropping
> Method 0x1, Attr 0x11, TID 0x1400001244, Hop Ptr: 0x0
> Feb 19 15:44:44 541434 [41A72940] 0x01 -> Received SMP on a 1 hop path: Initial path = 0,0, Return path = 0,0
> Feb 19 15:44:44 541442 [41A72940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT): SubnGet(NodeInfo), attr_mod 0x0, TID 0x1244
Looks like the switch SMA is not responding ? Can you try some smpquerys to it ?
Is this reproducible in this environment ?
More information about the ewg
mailing list