[openib-general] Question

Hal Rosenstock halr at voltaire.com
Mon Feb 28 15:32:05 PST 2005


On Mon, 2005-02-28 at 18:10, Ronald G. Minnich wrote: 
> I do get a bunch of these:
>  Bad LinearFDBTop value = 0xC000 on switch 0x2c90108d19820.

That comes from the following in osm_sw_info_rcv.c::osm_si_rcv_process 

/*
      Hack for bad value in Mellanox switch
    */
    if( cl_ntoh16( p_si->lin_top ) > IB_LID_UCAST_END_HO )
    {
      osm_log( p_rcv->p_log, OSM_LOG_ERROR,
               "osm_si_rcv_process: ERR 3610: "
               "\n\t\t\t\tBad LinearFDBTop value = 0x%X "
               "on switch 0x%" PRIx64 "."
               "\n\t\t\t\tForcing correction to 0x%X.\n",
               cl_ntoh16( p_si->lin_top ),
               cl_ntoh64( osm_node_get_node_guid( p_node ) ),
               0 );

      p_si->lin_top = 0;
    }

where include/iba/ib_types.h:
#define IB_LID_UCAST_END_HO                                     0xBFFF

So this looks like a workaround for a bug. Not sure what any of the
other symptoms are but I'm real curious now. Can someone comment more on
this ?

At a minimum, the SMA is reporting an invalid value for
PortInfo::LinearFDBTop. I wonder if it also is incapable of forwarding
DR MADs as well. That would explain this.

OpenSM needs a better way of dealing with failures. This need has been
documented. Sounds like this needs to be a high priority item. I will
start to work on this.

Thanks.

-- Hal

> 
> ron




More information about the general mailing list