[ofa-general] RE: [PATCHv2] OpenSM/osm_trap_rcv.c: Better Trap 131 Handling

Amit Krig amitk at mellanox.co.il
Tue Jul 10 09:39:56 PDT 2007


Hi Hal,

The watchdog mechanism may cause some hard time to communicate with the
end node, that is the reason I suggest to bring down its peer port and
by that stop the physical link from retraining all the time.

Amit 

-----Original Message-----
From: Hal Rosenstock [mailto:halr at voltaire.com] 
Sent: Tuesday, July 10, 2007 7:24 PM
To: Amit Krig
Cc: general at lists.openfabrics.org; Suresh Shelvapille; Yevgeny
Kliteynik; Eitan Zahavi
Subject: RE: [PATCHv2] OpenSM/osm_trap_rcv.c: Better Trap 131 Handling

Hi Amit,

On Tue, 2007-07-10 at 11:30, Amit Krig wrote:
> Hi Hal,
> 
> One comment,
> If one of the port is not responsive for some reason, need to move its

> peer port to DOWN and then check the OPVL,

Guess I'm still not following you exactly yet. 

The code here is not determining the port responsiveness. It is merely
triggering off the trap 131, recalculating and resetting OperationalVLs
if needed, and taking the port down at the link level which should start
it back to active, hopefully now with the proper OperationalVLs. If it
is still flooded with trap 131s, it disables the port.

-- Hal

> 
> Amit
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, July 10, 2007 5:39 PM
> To: general at lists.openfabrics.org
> Cc: Suresh Shelvapille; Amit Krig; Yevgeny Kliteynik; Eitan Zahavi
> Subject: [PATCHv2] OpenSM/osm_trap_rcv.c: Better Trap 131 Handling
> 
> OpenSM/osm_trap_rcv.c: Better trap 131 handling
> 
> When trap 131 occurs, check operational VLs and set port state to DOWN

> if needed.
> 
> I think this is what Amit was saying should be done in his emails 
> yesterday on the list (modified by Suri's comment).
> 
> Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> 
> diff --git a/opensm/opensm/osm_trap_rcv.c 
> b/opensm/opensm/osm_trap_rcv.c index f912dcd..3f60f3d 100644
> --- a/opensm/opensm/osm_trap_rcv.c
> +++ b/opensm/opensm/osm_trap_rcv.c
> @@ -550,16 +550,76 @@ __osm_trap_rcv_process_request(
>          }
>          else
>          {
> -          /* When babbling port policy option is enabled and
> -             Threshold for disabling a "babbling" port is exceeded */
> +          uint8_t               payload[IB_SMP_DATA_SIZE];
> +          ib_port_info_t*       p_pi = (ib_port_info_t*)payload;
> +          const ib_port_info_t* p_old_pi;
> +          osm_madw_context_t    context;
> +
> +          p_old_pi = &p_physp->port_info;
> +          memcpy( payload, p_old_pi, sizeof(ib_port_info_t) );
> +
> +          if (p_ntci->g_or_v.generic.trap_num == CL_HTON16(131))
> +          {
> +            uint8_t port_state, cur_opvls, opvls;
> +
> +            port_state = ib_port_info_get_port_state(p_old_pi);
> +            if (port_state != IB_LINK_DOWN)
> +            {
> +              /* First, validate OperationalVLs */
> +              cur_opvls = ib_port_info_get_op_vls(p_old_pi);
> +              opvls = osm_physp_calc_link_op_vls(p_rcv->p_log,
> p_rcv->p_subn, p_physp);
> +              if (opvls != cur_opvls)
> +              {
> +                osm_log(p_rcv->p_log, OSM_LOG_ERROR,
> +                        "__osm_trap_rcv_process_request: ERR 3809: "
> +                        "Current OP_VLs %d New OP_VLs %d\n",
> +                        cur_opvls, opvls);
> +                ib_port_info_set_op_vls(p_pi, opvls);
> +              }
> +
> +              /* Now, set port to DOWN if not already in INIT */
> +              if (port_state != IB_LINK_INIT)
> +              {
> +                ib_port_info_set_port_state( p_pi, IB_LINK_DOWN );
> +                ib_port_info_set_port_phys_state(
> IB_PORT_PHYS_STATE_NO_CHANGE, p_pi );
> +              }
> +              else
> +              {
> +                ib_port_info_set_port_state( p_pi, IB_LINK_NO_CHANGE
);
> +                ib_port_info_set_port_phys_state(
> IB_PORT_PHYS_STATE_NO_CHANGE, p_pi );
> +              }
> +
> +              /* Now, issue set of PortInfo */
> +              context.pi_context.node_guid = osm_node_get_node_guid(
> osm_physp_get_node_ptr( p_physp ) );
> +              context.pi_context.port_guid = osm_physp_get_port_guid(
> p_physp );
> +              context.pi_context.set_method = TRUE;
> +              context.pi_context.update_master_sm_base_lid = FALSE;
> +              context.pi_context.light_sweep = FALSE;
> +              context.pi_context.active_transition = FALSE;
> +
> +              status = osm_req_set( &p_rcv->p_subn->p_osm->sm.req,
> +                                     osm_physp_get_dr_path_ptr( 
> + p_physp
> ),
> +                                     payload,
> +                                     sizeof(payload),
> +                                     IB_MAD_ATTR_PORT_INFO,
> +                                     
> + cl_hton32(osm_physp_get_port_num(
> p_physp )),
> +                                     CL_DISP_MSGID_NONE,
> +                                    &context );
> +
> +              if( status != IB_SUCCESS )
> +              {
> +                 osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> +                          "__osm_trap_rcv_process_request: ERR 3812:
"
> +                          "Request to set PortInfo failed\n" );
> +              }
> +            }
> +         }
> + 
> +         /* When babbling port policy option is enabled and
> +            Threshold for disabling a "babbling" port is exceeded */
>            if ( p_rcv->p_subn->opt.babbling_port_policy &&
>                 num_received >= 250 )
>            {
> -            uint8_t               payload[IB_SMP_DATA_SIZE];
> -            ib_port_info_t*       p_pi = (ib_port_info_t*)payload;
> -            const ib_port_info_t* p_old_pi;
> -            osm_madw_context_t    context;
> -
>              /* If trap 131, might want to disable peer port if 
> available */
>              /* but peer port has been observed not to respond to SM 
> requests */
>  
> @@ -570,9 +630,6 @@ __osm_trap_rcv_process_request(
>                       p_ntci->data_details.ntc_129_131.port_num
>                       );
>  
> -            p_old_pi = &p_physp->port_info;
> -            memcpy( payload, p_old_pi, sizeof(ib_port_info_t) );
> -
>              /* Set port to disabled/down */
>              ib_port_info_set_port_state( p_pi, IB_LINK_DOWN );
>              ib_port_info_set_port_phys_state( 
> IB_PORT_PHYS_STATE_DISABLED, p_pi );
> 
> 
> 




More information about the general mailing list