[openib-general] opensm errors with ehca

Troy Benjegerdes hozer at hozed.org
Tue Nov 1 20:49:18 PST 2005


> Can you try the following opensm patch and see if this eliminates those
> timeout messages ?
> 
> This patch clears the high part of the attribute modifier when not a
> switch (when obtaining the PKeyTable).
> 
> -- Hal
> 
> Index: osm_port_info_rcv.c
> ===================================================================
> --- osm_port_info_rcv.c	(revision 3906)
> +++ osm_port_info_rcv.c	(working copy)
> @@ -430,6 +430,7 @@ void osm_pkey_get_tables(
>    osm_dr_path_t path;
>    uint8_t  port_num;
>    uint16_t block_num, max_blocks;
> +  uint32_t attr_mod_ho;
>    osm_switch_t* p_switch;
>  
>    OSM_LOG_ENTER( p_log, osm_physp_has_pkey );
> @@ -455,7 +456,7 @@ void osm_pkey_get_tables(
>    else
>    {
>      /* This is a switch, and not a management port. The maximum blocks is defined
> -       on the switch info partition enforcement cap. */
> +       in the switch info partition enforcement cap. */
>      p_switch = osm_get_switch_by_guid(p_subn, p_node->node_info.node_guid);
>  
>      if (! p_switch)
> @@ -472,10 +473,14 @@ void osm_pkey_get_tables(
>  
>    for (block_num = 0 ; block_num < max_blocks  ; block_num++)
>    {
> +    if (osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH)
> +      attr_mod_ho = block_num;
> +    else
> +      attr_mod_ho = block_num | (port_num << 16);
>      status = osm_req_get( p_req,
>                            &path,
>                            IB_MAD_ATTR_P_KEY_TABLE,
> -                          cl_hton32(block_num | (port_num << 16) ),
> +                          cl_hton32(attr_mod_ho),
>                            CL_DISP_MSGID_NONE,
>                            &context );
>  

This seems to ignore the IBM logical HCA, but gives the same thing
on the IBM Logical switch. Is there a way to ignore this as well?

switchguids=0x2550000038580
Switch  63 "S-0002550000038580"         # IBM Logical Switch 1 port 0
lid 21
[2]     "H-0002550000038500"[1]
[1]     "S-0002c90200402917"[22]


I still get:

Nov 01 22:34:08 660205 [43005960] -> umad_receiver: ERR 5409: send
completed wit
h error (method=0x1 attr=0x16 trans_id=0x13c9) -- dropping.
Nov 01 22:34:08 660213 [43005960] -> umad_receiver: ERR 5411: DR SMP hop
ptr 0 h
op count 2 DR SLID 0x0 DR DLID 0x0
Nov 01 22:34:08 660221 [43005960] -> __osm_sm_mad_ctrl_send_err_cb: ERR
3113: MA
D completed in error (IB_TIMEOUT).
Nov 01 22:34:08 660243 [43005960] -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x2
                                trans_id................0x13c9
                                attr_id.................0x16
(P_KeyTable)
                                resv....................0x0
                                attr_mod................0x10000
                                m_key...................0x0000000000000000
                                dr_slid.................0xFFFF
                                dr_dlid.................0xFFFF

                                Initial path: [0][1][16]
                                Return path:  [0][0][0]
                                Reserved:     [0][0][0][0][0][0][0]






More information about the general mailing list