[openib-general] [PATCH] [DAPL] [RFC] - remove duplicate disconnect event.

Steve Wise swise at opengridcomputing.com
Wed Apr 5 15:39:02 PDT 2006


James,

Running a 4 thread, 8 ep/thread dapltest (the last test in regress.sh),
I was intermittently seeing a seg fault in dapltest.  This is running
over the chelsio rnic using the iwarp branch.  After debugging I found
out that dapltest was freeing an already freed endpoint due to it
receiving duplicate disconnect events during test shutdown.  The code
assumes it will get exactly one disconnect event for every endpoint
(rightly so I guess).  

I tracked this down to the code in dapl_ep_disconnect() that generates
its own disconnect event in certain circumstances.  I removed this and
ran regress.sh over both mthca and cxgb3 with no problems.  

So my question to the dapl experts is: why is this code here?  For our
iwarp devices, it ends up sometimes generating duplicate disconnect
events.  I don't see why its needed.  If anyone can explain the logic,
that would be great.

With this patch and the previous patch the fixes dat_ep_free() to always
free the endpoint, I'm able to run dapltest 1-6 over the chelsio rnic.
As part of pulling in the iwarp support, I'd like the group to consider
pulling in these patches that fix issues with udapl (once we agree on
the final patches).  For now, I'll maintain these patches in the iwarp
branch...

Thanks,

Steve.


Signed-off-by: Steve Wise <swise at opengridcomputing.com>

Index: dapl_ep_disconnect.c
===================================================================
--- dapl_ep_disconnect.c	(revision 6253)
+++ dapl_ep_disconnect.c	(working copy)
@@ -70,8 +70,6 @@
 {
     DAPL_EP		*ep_ptr;
     DAPL_EVD		*evd_ptr;
-    DAPL_CR		*cr_ptr;
-    ib_cm_events_t	ib_cm_event;
     DAT_RETURN		dat_status;
 
     dapl_dbg_log (DAPL_DBG_TYPE_API | DAPL_DBG_TYPE_CM,
@@ -175,51 +173,6 @@
     dapl_os_unlock ( &ep_ptr->header.lock );
     dat_status =  dapls_ib_disconnect ( ep_ptr, disconnect_flags );
 
-    /*
-     * Reacquire the lock and make sure we didn't get a callback
-     * that cleaned up.
-     */
-    dapl_os_lock ( &ep_ptr->header.lock );
-    if (disconnect_flags == DAT_CLOSE_ABRUPT_FLAG &&
-	ep_ptr->param.ep_state == DAT_EP_STATE_DISCONNECT_PENDING )
-    {
-	/*
-	 * If this is an ABRUPT close, the provider will not generate
-	 * a disconnect message so we do it manually here. Just invoke
-	 * the CM callback as it will clean up the appropriate
-	 * data structures, reset the state, and generate the event
-	 * on the way out. Obtain the provider dependent cm_event to 
-	 * pass into the callback for a disconnect.
-	 */
-	ib_cm_event = dapls_ib_get_cm_event (DAT_CONNECTION_EVENT_DISCONNECTED);
-
-	cr_ptr = ep_ptr->cr_ptr;
-	dapl_os_unlock ( &ep_ptr->header.lock );
-
-	if (cr_ptr != NULL)
-	{
-	    dapl_dbg_log (DAPL_DBG_TYPE_API | DAPL_DBG_TYPE_CM,
-		"    dapl_ep_disconnect force callback on EP %p CM handle %x\n",
-		 ep_ptr, cr_ptr->ib_cm_handle);
-
-	    dapls_cr_callback (cr_ptr->ib_cm_handle,
-			       ib_cm_event,
-			       NULL,
-			       cr_ptr->sp_ptr);
-	}
-	else
-	{
-	    dapl_evd_connection_callback (ep_ptr->cm_handle,
-					  ib_cm_event,
-					  NULL,
-					  (void *) ep_ptr);
-	}
-    }
-    else
-    {
-	dapl_os_unlock ( &ep_ptr->header.lock );
-    }
-
 bail:
     dapl_dbg_log (DAPL_DBG_TYPE_RTN | DAPL_DBG_TYPE_CM,
 		  "dapl_ep_disconnect () returns 0x%x\n",




More information about the general mailing list