[ofa-general] Re: [PATCH] ib cm conn sent destroy

Thu Jul 3 12:25:56 PDT 2008

swise at opengridcomputing.com wrote on Thu, 03 Jul 2008 13:39 -0500:
> Pete Wyckoff wrote:
>> Admittedly, I'm in this situation due to working on broken device
>> code, but I think the patch below addresses a valid problem.
>> Against 2.6.26-rc8.
>>
>> Consider a very slow link and an impatient client.  It does
>> rdma_connect(), then times out at the application layer while the
>> IW_CM_STATE is still CONN_SENT.  It calls rdma_reject() and finally
>> rdma_destroy_id().  That leads to the BUG() in destroy_cm_id().
>>
>> Note the call to rdma_reject() is necessary to avoid hanging at the
>> top of destroy_cm_id(), waiting for the IWCM_F_CONNECT_WAIT bit to
>> be cleared.
>>
>>   
> This isn't the intended usage model.  rdma_accept() and rdma_reject()  
> are only valid on connect request cm_ids passed up in the  
> CONNECT_REQUEST event on the server side...
>
> (I'm not saying something isn't broken however :)

That makes sense.  I couldn't find anyway to get that bit
IWCM_F_CONNECT_WAIT cleared (without massive layering violations).
Is there anything like rdma_give_up_on_the_connect() that works
on the client side?  Am I expected just to wait for the transport
layer to time out the connect?  I could imagine a Ctrl-C situation
where immediate client-side cancellation is desired.

		-- Pete