[ofa-general] Standby OpenSM reporting "Bad LinearFDBTop" value

Lan Tran transter at gmail.com
Wed Mar 12 23:31:11 PDT 2008


Hello,

I was wondering if anyone has seen a similar issue or have any ideas
on why this issue may have shown up?  Thanks in advance for any
potential pointers!

I have a Master SM and Standby SM running on two separate nodes, both
connected to an 8-port Flextronic IB switch. Initially, both SM's came
up fine (and negotiated mastership OK). But then the Standby SM node
was rebooted; when the Standby SM came up again, it was outputing the
following error in OpenSM logs:

Mar 12 17:05:57 837338 [AAAB71A0] -> OpenSM Rev:openib-3.0.13
Mar 12 17:05:57 837371 [AAAB71A0] -> OpenSM Rev:openib-3.0.13
Mar 12 17:05:57 838467 [AAAB71A0] -> osm_vendor_bind: Binding to port
0x50450134010002
Mar 12 17:05:57 841495 [AAAB71A0] -> osm_vendor_bind: Binding to port
0x50450134010002
Mar 12 17:05:57 843032 [43204940] -> osm_si_rcv_process: ERR 3610:
                                Bad LinearFDBTop value = 0xC000 on
switch 0xb8cffff004879
                                Forcing correction to 0x0
Mar 12 17:05:58 049623 [41401940] -> Entering STANDBY state

Restarting the Standby SM doesn't help and neither does resetting the
IB switch. In this state, no LIDS were getting assigned. It seems
functional SM behaviour (and valid Linear forwarding table) was only
recovered by killing the Master SM, which caused SM migration. It's
not easily reproducible unfortunately.

The SM code seems to indicate this is a bug workaround, but not sure
if this means that it's HW switch issue, or how it would have occurred
in the first place?

    /*
      Hack for bad value in Mellanox switch
    */
    if( cl_ntoh16( p_si->lin_top ) > IB_LID_UCAST_END_HO )
    {
      osm_log( p_rcv->p_log, OSM_LOG_ERROR,
               "osm_si_rcv_process: ERR 3610: "
               "\n\t\t\t\tBad LinearFDBTop value = 0x%X "
               "on switch 0x%" PRIx64
               "\n\t\t\t\tForcing correction to 0x%X\n",
               cl_ntoh16( p_si->lin_top ),
               cl_ntoh64( osm_node_get_node_guid( p_node ) ),
               0 );

      p_si->lin_top = 0;
    }


Thanks again!
Lan



More information about the general mailing list