[ofa-general] RE: [PATCHv2] OpenSM/osm_trap_rcv.c: Better Trap 131 Handling

Hal Rosenstock halr at voltaire.com
Tue Jul 10 10:04:57 PDT 2007


Hi Amit,

On Tue, 2007-07-10 at 12:39, Amit Krig wrote:
> Hi Hal,
> 
> The watchdog mechanism may cause some hard time to communicate with the
> end node, that is the reason I suggest to bring down its peer port and
> by that stop the physical link from retraining all the time.

The patch uses the port indicated in the trap. Are you saying sometimes
that port will not be responsive to SMA requests (and in those cases the
peer should be used or at least tried) ?

-- Hal

> Amit 
> 
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com] 
> Sent: Tuesday, July 10, 2007 7:24 PM
> To: Amit Krig
> Cc: general at lists.openfabrics.org; Suresh Shelvapille; Yevgeny
> Kliteynik; Eitan Zahavi
> Subject: RE: [PATCHv2] OpenSM/osm_trap_rcv.c: Better Trap 131 Handling
> 
> Hi Amit,
> 
> On Tue, 2007-07-10 at 11:30, Amit Krig wrote:
> > Hi Hal,
> > 
> > One comment,
> > If one of the port is not responsive for some reason, need to move its
> 
> > peer port to DOWN and then check the OPVL,
> 
> Guess I'm still not following you exactly yet. 
> 
> The code here is not determining the port responsiveness. It is merely
> triggering off the trap 131, recalculating and resetting OperationalVLs
> if needed, and taking the port down at the link level which should start
> it back to active, hopefully now with the proper OperationalVLs. If it
> is still flooded with trap 131s, it disables the port.
> 
> -- Hal
> 
> > 
> > Amit
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Tuesday, July 10, 2007 5:39 PM
> > To: general at lists.openfabrics.org
> > Cc: Suresh Shelvapille; Amit Krig; Yevgeny Kliteynik; Eitan Zahavi
> > Subject: [PATCHv2] OpenSM/osm_trap_rcv.c: Better Trap 131 Handling
> > 
> > OpenSM/osm_trap_rcv.c: Better trap 131 handling
> > 
> > When trap 131 occurs, check operational VLs and set port state to DOWN
> 
> > if needed.
> > 
> > I think this is what Amit was saying should be done in his emails 
> > yesterday on the list (modified by Suri's comment).
> > 
> > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> > 
> > diff --git a/opensm/opensm/osm_trap_rcv.c 
> > b/opensm/opensm/osm_trap_rcv.c index f912dcd..3f60f3d 100644
> > --- a/opensm/opensm/osm_trap_rcv.c
> > +++ b/opensm/opensm/osm_trap_rcv.c
> > @@ -550,16 +550,76 @@ __osm_trap_rcv_process_request(
> >          }
> >          else
> >          {
> > -          /* When babbling port policy option is enabled and
> > -             Threshold for disabling a "babbling" port is exceeded */
> > +          uint8_t               payload[IB_SMP_DATA_SIZE];
> > +          ib_port_info_t*       p_pi = (ib_port_info_t*)payload;
> > +          const ib_port_info_t* p_old_pi;
> > +          osm_madw_context_t    context;
> > +
> > +          p_old_pi = &p_physp->port_info;
> > +          memcpy( payload, p_old_pi, sizeof(ib_port_info_t) );
> > +
> > +          if (p_ntci->g_or_v.generic.trap_num == CL_HTON16(131))
> > +          {
> > +            uint8_t port_state, cur_opvls, opvls;
> > +
> > +            port_state = ib_port_info_get_port_state(p_old_pi);
> > +            if (port_state != IB_LINK_DOWN)
> > +            {
> > +              /* First, validate OperationalVLs */
> > +              cur_opvls = ib_port_info_get_op_vls(p_old_pi);
> > +              opvls = osm_physp_calc_link_op_vls(p_rcv->p_log,
> > p_rcv->p_subn, p_physp);
> > +              if (opvls != cur_opvls)
> > +              {
> > +                osm_log(p_rcv->p_log, OSM_LOG_ERROR,
> > +                        "__osm_trap_rcv_process_request: ERR 3809: "
> > +                        "Current OP_VLs %d New OP_VLs %d\n",
> > +                        cur_opvls, opvls);
> > +                ib_port_info_set_op_vls(p_pi, opvls);
> > +              }
> > +
> > +              /* Now, set port to DOWN if not already in INIT */
> > +              if (port_state != IB_LINK_INIT)
> > +              {
> > +                ib_port_info_set_port_state( p_pi, IB_LINK_DOWN );
> > +                ib_port_info_set_port_phys_state(
> > IB_PORT_PHYS_STATE_NO_CHANGE, p_pi );
> > +              }
> > +              else
> > +              {
> > +                ib_port_info_set_port_state( p_pi, IB_LINK_NO_CHANGE
> );
> > +                ib_port_info_set_port_phys_state(
> > IB_PORT_PHYS_STATE_NO_CHANGE, p_pi );
> > +              }
> > +
> > +              /* Now, issue set of PortInfo */
> > +              context.pi_context.node_guid = osm_node_get_node_guid(
> > osm_physp_get_node_ptr( p_physp ) );
> > +              context.pi_context.port_guid = osm_physp_get_port_guid(
> > p_physp );
> > +              context.pi_context.set_method = TRUE;
> > +              context.pi_context.update_master_sm_base_lid = FALSE;
> > +              context.pi_context.light_sweep = FALSE;
> > +              context.pi_context.active_transition = FALSE;
> > +
> > +              status = osm_req_set( &p_rcv->p_subn->p_osm->sm.req,
> > +                                     osm_physp_get_dr_path_ptr( 
> > + p_physp
> > ),
> > +                                     payload,
> > +                                     sizeof(payload),
> > +                                     IB_MAD_ATTR_PORT_INFO,
> > +                                     
> > + cl_hton32(osm_physp_get_port_num(
> > p_physp )),
> > +                                     CL_DISP_MSGID_NONE,
> > +                                    &context );
> > +
> > +              if( status != IB_SUCCESS )
> > +              {
> > +                 osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> > +                          "__osm_trap_rcv_process_request: ERR 3812:
> "
> > +                          "Request to set PortInfo failed\n" );
> > +              }
> > +            }
> > +         }
> > + 
> > +         /* When babbling port policy option is enabled and
> > +            Threshold for disabling a "babbling" port is exceeded */
> >            if ( p_rcv->p_subn->opt.babbling_port_policy &&
> >                 num_received >= 250 )
> >            {
> > -            uint8_t               payload[IB_SMP_DATA_SIZE];
> > -            ib_port_info_t*       p_pi = (ib_port_info_t*)payload;
> > -            const ib_port_info_t* p_old_pi;
> > -            osm_madw_context_t    context;
> > -
> >              /* If trap 131, might want to disable peer port if 
> > available */
> >              /* but peer port has been observed not to respond to SM 
> > requests */
> >  
> > @@ -570,9 +630,6 @@ __osm_trap_rcv_process_request(
> >                       p_ntci->data_details.ntc_129_131.port_num
> >                       );
> >  
> > -            p_old_pi = &p_physp->port_info;
> > -            memcpy( payload, p_old_pi, sizeof(ib_port_info_t) );
> > -
> >              /* Set port to disabled/down */
> >              ib_port_info_set_port_state( p_pi, IB_LINK_DOWN );
> >              ib_port_info_set_port_phys_state( 
> > IB_PORT_PHYS_STATE_DISABLED, p_pi );
> > 
> > 
> > 
> 




More information about the general mailing list