[openib-general] [PATCH] osm: fix segfault due to unprotected access to InformInfo DB

Eitan Zahavi eitan at mellanox.co.il
Mon Jun 19 12:12:07 PDT 2006


Hi Hal

I have added InformInfo requests to the osmStress simulator flow.
Running it overnight exposed a bug as OpenSM segfaulted during
osm_report_notice. Some debug shows the following two flows were
missing a lock. Such that under stress the InformInfo DB was altered
while being accessed by the code in osm_report_notice.

I have verified the other flows calling osm_report_notice are under a
lock. 

The fixed code is running for a while with no crash so far.

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_state_mgr.c
===================================================================
--- opensm/osm_state_mgr.c	(revision 8113)
+++ opensm/osm_state_mgr.c	(working copy)
@@ -1709,6 +1709,7 @@ __osm_state_mgr_report_new_ports(
 
    OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_report_new_ports );
 
+   CL_PLOCK_ACQUIRE( p_mgr->p_lock );
    p_port =
       ( osm_port_t
         * ) ( cl_list_remove_head( &p_mgr->p_subn->new_ports_list ) );
@@ -1759,6 +1760,7 @@ __osm_state_mgr_report_new_ports(
          ( osm_port_t
            * ) ( cl_list_remove_head( &p_mgr->p_subn->new_ports_list ) );
    }
+   CL_PLOCK_RELEASE( p_mgr->p_lock );
 
    OSM_LOG_EXIT( p_mgr->p_log );
 }
Index: opensm/osm_trap_rcv.c
===================================================================
--- opensm/osm_trap_rcv.c	(revision 8113)
+++ opensm/osm_trap_rcv.c	(working copy)
@@ -652,7 +652,10 @@ __osm_trap_rcv_process_request(
     p_ntci->issuer_gid.unicast.interface_id = p_port->guid;
   }
 
+  /* we need a lock here as the InformInfo DB must be stable */
+  CL_PLOCK_ACQUIRE( p_rcv->p_lock );
   status = osm_report_notice(p_rcv->p_log, p_rcv->p_subn, p_ntci);
+  CL_PLOCK_RELEASE( p_rcv->p_lock );
   if( status != IB_SUCCESS )
   {
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,





More information about the general mailing list