[openib-general] opensm errors with ehca

Hal Rosenstock halr at voltaire.com
Wed Nov 2 03:44:55 PST 2005


On Tue, 2005-11-01 at 23:49, Troy Benjegerdes wrote:
> > Can you try the following opensm patch and see if this eliminates those
> > timeout messages ?
> >
> > This patch clears the high part of the attribute modifier when not a
> > switch (when obtaining the PKeyTable).
> >
> > -- Hal
> >
> > Index: osm_port_info_rcv.c
> > ===================================================================
> > --- osm_port_info_rcv.c     (revision 3906)
> > +++ osm_port_info_rcv.c     (working copy)
> > @@ -430,6 +430,7 @@ void osm_pkey_get_tables(
> >    osm_dr_path_t path;
> >    uint8_t  port_num;
> >    uint16_t block_num, max_blocks;
> > +  uint32_t attr_mod_ho;
> >    osm_switch_t* p_switch;
> > 
> >    OSM_LOG_ENTER( p_log, osm_physp_has_pkey );
> > @@ -455,7 +456,7 @@ void osm_pkey_get_tables(
> >    else
> >    {
> >      /* This is a switch, and not a management port. The maximum blocks is defined
> > -       on the switch info partition enforcement cap. */
> > +       in the switch info partition enforcement cap. */
> >      p_switch = osm_get_switch_by_guid(p_subn, p_node->node_info.node_guid);
> > 
> >      if (! p_switch)
> > @@ -472,10 +473,14 @@ void osm_pkey_get_tables(
> > 
> >    for (block_num = 0 ; block_num < max_blocks  ; block_num++)
> >    {
> > +    if (osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH)
> > +      attr_mod_ho = block_num;
> > +    else
> > +      attr_mod_ho = block_num | (port_num << 16);
> >      status = osm_req_get( p_req,
> >                            &path,
> >                            IB_MAD_ATTR_P_KEY_TABLE,
> > -                          cl_hton32(block_num | (port_num << 16) ),
> > +                          cl_hton32(attr_mod_ho),
> >                            CL_DISP_MSGID_NONE,
> >                            &context );
> > 
>
> This seems to ignore the IBM logical HCA, but gives the same thing
> on the IBM Logical switch. Is there a way to ignore this as well?

It is correct for the logical switch. It needs to be handled there per
the spec. The high 16 bits are required to be the port number whereas
for HCAs and routers this was ignore. This _will_ require a firmware
change. I'm unaware of a workaround for this unless we want to do it
only for the IBM OUI only temporarily. Will they all have this OUI
000255 ?

BTW, getting this error does not appear to cause any bad effects. Does this agree
with what you are seeing ?

-- Hal

> switchguids=0x2550000038580
> Switch  63 "S-0002550000038580"         # IBM Logical Switch 1 port 0
> lid 21
> [2]     "H-0002550000038500"[1]
> [1]     "S-0002c90200402917"[22]
>
>
> I still get:
>
> Nov 01 22:34:08 660205 [43005960] -> umad_receiver: ERR 5409: send
> completed wit
> h error (method=0x1 attr=0x16 trans_id=0x13c9) -- dropping.
> Nov 01 22:34:08 660213 [43005960] -> umad_receiver: ERR 5411: DR SMP hop
> ptr 0 h
> op count 2 DR SLID 0x0 DR DLID 0x0
> Nov 01 22:34:08 660221 [43005960] -> __osm_sm_mad_ctrl_send_err_cb: ERR
> 3113: MA
> D completed in error (IB_TIMEOUT).
> Nov 01 22:34:08 660243 [43005960] -> SMP dump:
>                                 base_ver................0x1
>                                 mgmt_class..............0x81
>                                 class_ver...............0x1
>                                 method..................0x1 (SubnGet)
>                                 D bit...................0x0
>                                 status..................0x0
>                                 hop_ptr.................0x0
>                                 hop_count...............0x2
>                                 trans_id................0x13c9
>                                 attr_id.................0x16
> (P_KeyTable)
>                                 resv....................0x0
>                                 attr_mod................0x10000
>                                 m_key...................0x0000000000000000
>                                 dr_slid.................0xFFFF
>                                 dr_dlid.................0xFFFF
>
>                                 Initial path: [0][1][16]
>                                 Return path:  [0][0][0]
>                                 Reserved:     [0][0][0][0][0][0][0]
>
>
>





More information about the general mailing list