[openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own

James Lentini jlentini at netapp.com
Tue May 31 10:27:13 PDT 2005


On Fri, 27 May 2005, Tom Duffy wrote:

> On Thu, 2005-05-26 at 22:25 -0700, Sean Hefty wrote:
>>> So, here is the strategy I am taking.  Please let me know if it is
>>> wrong.
>>>
>>> When dapl_ep_connect() is called, I save off the timeout value into the
>>> dapl_ep struct.  Then, when we get ready to call ib_send_cm_req(), I
>>> stuff the timeout value (after munging it into IB's strange format) into
>>> the conn params remote_cm_response_timeout.
>>
>> From a CM perspective, this sounds fine.  Note that the CM timeout will not
>> occur until the number of retries has been met.  So I don't know if the
>> timeout passed to dapl_ep_connect() should convert directly into the
>> remote_cm_response_timeout, or needs to be divided by the number of retries.
>
> So, are you saying that if you have a timeout of 4 seconds (you pass in
> 20) and you have retries set to 2, that it will fail after 8 seconds?
>
> James, what is the timeout value passed into dapl_ep_connect mean, the
> total timeout time?  Or how much for each retry?

It is the total timeout value.

> Also, did you notice that dapl_ib_connect always sets the timeout to 20
> (4 seconds) no matter what?  Should this be the case?

The timeout should not be constant as it is now. It was being 
unnecessarily emulated with the extra "timeout" thread.

>>> If the connection fails to complete within the timeout,
>>> dapl_cm_active_cb_handler() is called with IB_CM_REQ_ERROR which in turn
>>> calls dapl_evd_connection_callback() which does the same thing that
>>> dapl_ep_timeout() used to do -- tear down the connection.
>>
>> I haven't looked at your changes, but note that calling ib_destroy_cm_id
>> from within the CM callback thread will hang.  The callback holds a
>> reference on the cm_id.  The good news is that there should be code in kDAPL
>> to catch this.
>
> I will take a look and see if this could happen.

Tom, I don't believe that you've changed Hal and Sean's implementation 
of this.



More information about the general mailing list