[ofw] CM ref counting issues...

Hefty, Sean sean.hefty at intel.com
Thu Dec 17 07:55:22 PST 2009


>> For example, in my testing, a REP mad was completed as canceled;
>
>A REP?  If a REP times out, why aren't you ending up sending a REJ and aborting
>the connection?

The RTU for a connection can be lost, but the connection still formed.  An app will see transferred data on the QP.  If the app then issues a DREQ, the state transitions to DREQ_SENT.  This is the state that the connection is in when the send callback is invoked for the REP.  The connections are much shorter lived than the CM message timeouts are in this case.

I would need to double check, but I thought the REP completed as canceled, not timed out.

>Not sure I quite follow here... The DREQ_SENT state should have invoked the
>callback.

This is a mad completion callback, not a cep state callback.  The mad was a REP, but the state was a DREQ_SENT.  This was the case I observed, but I'm pretty sure that other, similar problems exist.

>> +	else
>> +	{
>> +		KeReleaseInStackQueuedSpinLockFromDpcLevel( &hdl );
>> +		ib_put_mad( p_mad );
>> +	}
>
>Are you going to skip the switch statement on the MAD status then?  If so,
>don't forget to release the reference on the CEP held by the MAD.  Seems like
>you're missing a 'goto done;' here.

Yes - this needs to jump to the end, so we don't try to release the lock twice and we do release the reference on the cep.



More information about the ofw mailing list