[Openib-windows] RE: Assert on ib_cm_rtu in the file Al_cm_qp.c

Fab Tillier ftillier at silverstorm.com
Thu Dec 8 09:21:38 PST 2005


> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Thursday, December 08, 2005 8:25 AM
> 
> Hi fab,
> 
> while testing some stress tests on my computer I have reached
> the following assert:
> CL_ASSERT( cid == h_cm_rep.cid );
> 
> (near line 1514).
> 
> The problem was that cid was 0xffffffff and not what expected (h_cm_rep.cid ).
> 
> I have made some tests with the debugger and came to conclousion that there is
> a timeout mechanism that changes this value to -1. I also believe that this
> happens in the function __proc_conn_timeout().
> 
> As a result, I believe that the correct assert should be
> 
> CL_ASSERT( cid == h_cm_rep.cid || cid==AL_INVALID_CID);
> 
> What do you think?

Actually, I think that the if statement checking the status of al_cep_rtu should
check for IB_SUCCESS or IB_INVALID_STATE.  However, I agree we should trap for
AL_INVALID_CID - but move the code into the body of a conditional.

How does this look:

Index: core/al/al_cm_qp.c
===================================================================
--- core/al/al_cm_qp.c	(revision 200)
+++ core/al/al_cm_qp.c	(working copy)
@@ -1496,7 +1496,7 @@
 
 	status = al_cep_rtu( h_cm_rep.h_al, h_cm_rep.cid,
 		p_cm_rtu->p_rtu_pdata, p_cm_rtu->rtu_length );
-	if( status != IB_SUCCESS )
+	if( status != IB_SUCCESS && status != IB_INVALID_STATE )
 	{
 err:
 		/* Reject and abort the connection. */
@@ -1508,13 +1508,16 @@
 		cid = cl_atomic_xchg(
 			&((al_conn_qp_t*)h_cm_rep.h_qp)->cid, AL_INVALID_CID );
 
-		CL_ASSERT( cid == h_cm_rep.cid );
+		if( cid != AL_INVALID_CID )
+		{
+			CL_ASSERT( cid == h_cm_rep.cid );
 
-		ref_al_obj( &h_cm_rep.h_qp->obj );
-		if( al_destroy_cep(
-			h_cm_rep.h_al, h_cm_rep.cid, deref_al_obj ) !=
IB_SUCCESS )
-		{
-			deref_al_obj( &h_cm_rep.h_qp->obj );
+			ref_al_obj( &h_cm_rep.h_qp->obj );
+			if( al_destroy_cep(
+				h_cm_rep.h_al, h_cm_rep.cid, deref_al_obj ) !=
IB_SUCCESS )
+			{
+				deref_al_obj( &h_cm_rep.h_qp->obj );
+			}
 		}
 
 		AL_TRACE_EXIT( AL_DBG_ERROR,

If this looks good I'll commit it.

- Fab




More information about the ofw mailing list