[ewg] OpenSM from ofed-1.2 and ofed-1.3 clients

Hal Rosenstock hrosenstock at xsigo.com
Tue Jun 3 10:13:07 PDT 2008


Steve,

One more thought below...

On Tue, 2008-06-03 at 09:49 -0700, Hal Rosenstock wrote:
> Steve,
> 
> On Tue, 2008-06-03 at 11:19 -0500, Steve Wise wrote:
> > Hello opensm gurus:
> > 
> > Sandia is seeing problems after migrating up to ofed-1.3.  They are 
> > still using an ofed-1.2 opensm but with ofed-1.3 clients, updated from 
> > ofed-1.2.5. 
> 
> Was the OpenSM node changed in some way or only the end nodes ?
> 
> > They are getting the errors below.  
> > 
> > Q: should this work?  Or are the backwards compat issues?
> 
> I haven't explictly tried it but I would think it should work.
> 
> The errors below are timeouts on switch MFT sets which are only
> indirectly related to the end nodes (in that the MC SA joins cause the
> MC routing and those tables to be set) so I don't see the relationship
> but might be missing something.
> 
> -- Hal
> 
> > Thanks,
> > 
> > Steve.
> > 
> > 
> > 
> > 
> > log:
> > > May 23 08:29:22 408613 [45007960] -> __osm_sm_mad_ctrl_send_err_cb: 
> > > ERR 3113: MAD completed in error (IB_TIMEOUT)
> > > May 23 08:29:22 408622 [45007960] -> __osm_sm_mad_ctrl_send_err_cb: 
> > > ERR 3119: Set method failed
> > > May 23 08:29:22 408652 [45007960] -> SMP dump:
> > >                                 base_ver................0x1
> > >                                 mgmt_class..............0x81
> > >                                 class_ver...............0x1
> > >                                 method..................0x2 (SubnSet)
> > >                                 D bit...................0x0
> > >                                 status..................0x0
> > >                                 hop_ptr.................0x0
> > >                                 hop_count...............0x3
> > >                                 trans_id................0x1694a4
> > >                                 attr_id.................0x1B 
> > > (MulticastForwardingTable)
> > >                                 resv....................0x0
> > >                                 attr_mod................0x10000000
> > >                                 m_key...................0x0000000000000000
> > >                                 dr_slid.................0xFFFF
> > >                                 dr_dlid.................0xFFFF
> > >
> > >                                 Initial path: 0,1,14,9

Could this switch SMA be "stuck" ?

Could you try smpquery -D nodeinfo 0,1,14,9
and
smpquery -D nodeinfo 0,1,14
from the SM node ?

-- Hal

> > >                                 Return path:  0,0,0,0
> > >                                 Reserved:     [0][0][0][0][0][0][0]
> > >
> > >                                 00 40 00 40 00 00 00 40   00 00 00 00 
> > > 00 00 00 00
> > >
> > >                                 00 00 00 00 00 00 00 00   00 00 00 00 
> > > 00 00 00 00
> > >
> > >                                 00 00 00 00 00 00 00 00   00 00 00 00 
> > > 00 00 00 00
> > >
> > >                                 00 00 00 00 00 00 00 00   00 00 00 00 
> > > 00 00 00 00
> > >
> > > May 23 08:29:22 408689 [45007960] -> umad_receiver: ERR 5409: send 
> > > completed with error (method=0x2 attr=0x1B trans_id=0x14001694a5) -- 
> > > dropping
> > > May 23 08:29:22 408699 [45007960] -> umad_receiver: ERR 5411: DR SMP 
> > > Hop Ptr: 0x0
> > > May 23 08:29:22 408711 [45007960] -> Received SMP on a 3 hop path:
> > >                                 Initial path = 0,0,0,0
> > >                                 Return path  = 0,0,0,0
> > > May 23 08:29:22 408721 [45007960] -> __osm_sm_mad_ctrl_send_err_cb: 
> > > ERR 3113: MAD completed in error (IB_TIMEOUT)
> > > May 23 08:29:22 408729 [45007960] -> __osm_sm_mad_ctrl_send_err_cb: 
> > > ERR 3119: Set method failed
> > > May 23 08:29:22 408759 [45007960] -> SMP dump:
> > >                                 base_ver................0x1
> > >                                 mgmt_class..............0x81
> > >                                 class_ver...............0x1
> > >                                 method..................0x2 (SubnSet)
> > >                                 D bit...................0x0
> > >                                 status..................0x0
> > >                                 hop_ptr.................0x0
> > >                                 hop_count...............0x3
> > >                                 trans_id................0x1694a5
> > >                                 attr_id.................0x1B 
> > > (MulticastForwardingTable)
> > >                                 resv....................0x0
> > >                                 attr_mod................0x1
> > >                                 m_key...................0x0000000000000000
> > >                                 dr_slid.................0xFFFF
> > >                                 dr_dlid.................0xFFFF
> > >
> > >                                 Initial path: 0,1,14,9
> > >                                 Return path:  0,0,0,0
> > >                                 Reserved:     [0][0][0][0][0][0][0]
> > >
> > >                                 00 00 00 00 00 00 00 20   00 00 00 00 
> > > 00 00 00 00
> > >
> > >                                 00 00 00 00 00 00 00 00   00 00 00 00 
> > > 04 00 00 00
> > >
> > >                                 00 00 00 00 00 00 00 00   00 00 00 00 
> > > 00 00 00 00
> > >
> > >                                 00 00 00 00 00 00 00 00   00 00 00 00 
> > > 00 00 00 10
> > >
> > > May 23 08:29:22 412432 [42803960] -> Errors during initialization
> > > May 23 08:29:22 412508 [42803960] -> __osm_state_mgr_init_errors_msg:
> > 
> > _______________________________________________
> > ewg mailing list
> > ewg at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg




More information about the ewg mailing list