[ofw] [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout

Hefty, Sean sean.hefty at intel.com
Fri Dec 3 16:16:06 PST 2010


> >> @@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN
> >> DAT_CLOSE_FLAGS close_flags)
> >>
> >>  	/* ABRUPT close, wait for callback and DISCONNECTED state */
> >>  	if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
> >> +		DAPL_EVD *evd = NULL;
> >> +		DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
> >> +
> >>  		dapl_os_lock(&ep_ptr->header.lock);
> >> -		while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
> >> +		/* limit DREP waiting, other side could be down */
> >> +		while (--drep_time && ep_ptr->param.ep_state !=
> >> DAT_EP_STATE_DISCONNECTED) {
> >>  			dapl_os_unlock(&ep_ptr->header.lock);
> >>  			dapl_os_sleep_usec(10000);
> >
> >gak - can't you wait on an event using some timeout interval?
> >
> 
> if rdma_cm would give me separate timeout interval choices
> for connect requests and disconnect requests than by all
> means I would use it for this abrupt disconnect timeout/retry
> interval.

That's a separate issue.  I'm suggesting to replace

while (not in the right state)
	retries--;
	sleep(timeout);

with

wait_for_event(disconnected_event, total timeout);



More information about the ofw mailing list