[ofa-general] Re: [PATCH] ib cm conn sent destroy
pw at osc.edu
Thu Jul 3 12:25:56 PDT 2008
swise at opengridcomputing.com wrote on Thu, 03 Jul 2008 13:39 -0500:
> Pete Wyckoff wrote:
>> Admittedly, I'm in this situation due to working on broken device
>> code, but I think the patch below addresses a valid problem.
>> Against 2.6.26-rc8.
>> Consider a very slow link and an impatient client. It does
>> rdma_connect(), then times out at the application layer while the
>> IW_CM_STATE is still CONN_SENT. It calls rdma_reject() and finally
>> rdma_destroy_id(). That leads to the BUG() in destroy_cm_id().
>> Note the call to rdma_reject() is necessary to avoid hanging at the
>> top of destroy_cm_id(), waiting for the IWCM_F_CONNECT_WAIT bit to
>> be cleared.
> This isn't the intended usage model. rdma_accept() and rdma_reject()
> are only valid on connect request cm_ids passed up in the
> CONNECT_REQUEST event on the server side...
> (I'm not saying something isn't broken however :)
That makes sense. I couldn't find anyway to get that bit
IWCM_F_CONNECT_WAIT cleared (without massive layering violations).
Is there anything like rdma_give_up_on_the_connect() that works
on the client side? Am I expected just to wait for the transport
layer to time out the connect? I could imagine a Ctrl-C situation
where immediate client-side cancellation is desired.
More information about the general