[openib-general] Re: [PATCH] uDAPL with uCM and uAT support
James Lentini
jlentini at netapp.com
Thu Jul 28 12:42:44 PDT 2005
On Thu, 28 Jul 2005, Arlin Davis wrote:
> James Lentini wrote:
>
>>
>> A couple of questions:
>>
>> On Sun, 24 Jul 2005, Arlin Davis wrote:
>>
>>> /*
>>> * We now own the EVD, even though we don't have the lock anymore,
>>> * because we're in the WAITED state.
>>> @@ -182,37 +171,54 @@
>>> * return right away if the ib_cq_handle associate with these evd
>>> * equal to IB_INVALID_HANDLE
>>> */
>>> - dapls_evd_copy_cq(evd_ptr);
>>> -
>>> - if (dapls_rbuf_count(&evd_ptr->pending_event_queue) >= threshold)
>>> - {
>>> - break;
>>> - }
>>> -
>>> - /*
>>> - * Do not enable the completion notification if this evd is not
>>> - * a DTO_EVD or RMR_BIND_EVD
>>> + /* Logic to prevent missing completion between copy_cq (poll)
>>> + * and completion_notify (re-arm)
>>> */
>>> - if ( (!notify_requested) &&
>>> - ((evd_ptr->evd_flags & DAT_EVD_DTO_FLAG) ||
>>> - (evd_ptr->evd_flags & DAT_EVD_RMR_BIND_FLAG)) )
>>> + notify_needed = DAT_TRUE;
>>> + new_events = 0;
>>> + while (DAT_TRUE)
>>> {
>>> - dat_status = dapls_ib_completion_notify (
>>> - evd_ptr->header.owner_ia->hca_ptr->ib_hca_handle,
>>> - evd_ptr,
>>> - (evd_ptr->completion_type == DAPL_EVD_STATE_SOLICITED_WAIT) ?
>>> - IB_NOTIFY_ON_SOLIC_COMP : IB_NOTIFY_ON_NEXT_COMP );
>>> -
>>> - DAPL_CNTR(DCNT_EVD_WAIT_CMP_NTFY);
>>> - /* FIXME report error */
>>> - dapl_os_assert(dat_status == DAT_SUCCESS);
>>> + dapls_evd_copy_cq(evd_ptr); /* poll for new completions */
>>> + total_events = dapls_rbuf_count (&evd_ptr->pending_event_queue);
>>> + new_events = total_events - new_events;
>>> + if (total_events >= threshold ||
>>> + (!new_events && notify_needed == DAT_FALSE))
>>> + {
>>> + break;
>>> + }
>>> +
>>> + /*
>>> + * Do not enable the completion notification if this evd is not
>>> + * a DTO_EVD or RMR_BIND_EVD
>>> + */
>>> + if ( (evd_ptr->evd_flags & DAT_EVD_DTO_FLAG) ||
>>> + (evd_ptr->evd_flags & DAT_EVD_RMR_BIND_FLAG) )
>>> + {
>>> + dat_status = dapls_ib_completion_notify (
>>> + evd_ptr->header.owner_ia->hca_ptr->ib_hca_handle,
>>> + evd_ptr,
>>> + (evd_ptr->completion_type ==
>>> DAPL_EVD_STATE_SOLICITED_WAIT)
>>> ?
>>> + IB_NOTIFY_ON_SOLIC_COMP : IB_NOTIFY_ON_NEXT_COMP
>>> );
>>> +
>>> + DAPL_CNTR(DCNT_EVD_WAIT_CMP_NTFY);
>>> + notify_needed = DAT_FALSE;
>>> + new_events = total_events;
>>> +
>>> + /* FIXME report error */
>>> + dapl_os_assert(dat_status == DAT_SUCCESS);
>>> + }
>>> + else
>>> + {
>>> + break;
>>> + }
>>>
>>> - notify_requested = DAT_TRUE;
>>> + } /* while completions < threshold, and rearm needed */
>>>
>>> - /* Try again. */
>>> - continue;
>>> + if (total_events >= threshold)
>>> + {
>>> + break;
>>> }
>>> -
>>> +
>>
>>
>> Can you explain the chages above?
>>
>> This is my reading of how things used to work: the thread that called
>> dapl_evd_wait polled the CQ until the number of events reaped was >= the
>> threshold paramter. If a poll did not reap enough events, the thread turned
>> notification on (in certain situations). I don't see why it need to do this
>> though, since it went directly back to polling and the notification upcall
>> essentially did nothing if there EVD was in the WAITED state.
>>
>> Obviously you saw a bug in the old implementation. Is my description of how
>> the code used to work wrong?
>
> You are correct, but there is a small window where you can miss the CQ event
> if you re-arm and wait but the CQ completion/event occured between the last
> poll and the rearm/wait. This change adds another cq_poll after every re-arm
> to cover this senario and make sure notification is armed and new events are
> expected before going into the wait.
Good catch.
>>
>>> Index: dapl/udapl/Makefile
>>> ===================================================================
>>> --- dapl/udapl/Makefile (revision 2899)
>>> +++ dapl/udapl/Makefile (working copy)
>>> @@ -122,7 +122,8 @@
>>> #
>>> ifeq ($(VERBS),openib)
>>> PROVIDER = $(TOPDIR)/../openib
>>> -CFLAGS += -DOPENIB -DCQ_WAIT_OBJECT
>>> +CFLAGS += -DOPENIB
>>> +#CFLAGS += -DCQ_WAIT_OBJECT
>>
>>
>> Do the OpenIB uverbs support a CQ_WAIT_OBJECT? If not, we should remove the
>> line above. When I see it commented out, it makes me think that the
>> functionality is supported but not implemented in uDAPL.
>
> Yes, openIB supports direct wait objects via fd's. There are some issues with
> the openIB implementation that need to be worked out.
Ok. Let's add a note that says when X is fixed uncomment this line.
>>
>>> Index: dapl/common/dapl_evd_resize.c
>>> ===================================================================
>>> --- dapl/common/dapl_evd_resize.c (revision 2899)
>>> +++ dapl/common/dapl_evd_resize.c (working copy)
>>> @@ -67,72 +67,130 @@
>>> IN DAT_EVD_HANDLE evd_handle,
>>> IN DAT_COUNT evd_qlen )
>>> {
>>> - DAPL_IA *ia_ptr;
>>> - DAPL_EVD *evd_ptr;
>>> - DAT_COUNT pend_cnt;
>>> - DAT_RETURN dat_status;
>>> + DAPL_IA *ia_ptr;
>>> + DAPL_EVD *evd_ptr;
>>> + DAT_EVENT *event_ptr;
>>> + DAT_EVENT *events;
>>> + DAT_EVENT *orig_event;
>>> + DAPL_RING_BUFFER free_event_queue;
>>> + DAPL_RING_BUFFER pending_event_queue;
>>> + DAT_COUNT pend_cnt;
>>> + DAT_COUNT i;
>>>
>>> dapl_dbg_log (DAPL_DBG_TYPE_API, "dapl_evd_resize (%p, %d)\n",
>>> evd_handle, evd_qlen);
>>>
>>> if (DAPL_BAD_HANDLE (evd_handle, DAPL_MAGIC_EVD))
>>> {
>>> - dat_status = DAT_ERROR (DAT_INVALID_HANDLE,0);
>>> - goto bail;
>>> + return DAT_ERROR (DAT_INVALID_PARAMETER,DAT_INVALID_ARG1);
>>> }
>>>
>>> evd_ptr = (DAPL_EVD *)evd_handle;
>>> ia_ptr = evd_ptr->header.owner_ia;
>>>
>>> - if ( evd_qlen == evd_ptr->qlen )
>>> + if ((evd_qlen <= 0) || (evd_ptr->qlen > evd_qlen))
>>> {
>>> - dat_status = DAT_SUCCESS;
>>> - goto bail;
>>> + return DAT_ERROR (DAT_INVALID_PARAMETER,DAT_INVALID_ARG2);
>>> }
>>
>>
>> Your check to make sure that the new size is greater that 0 is a good one.
>>
>> I don't think we should return an error when the consumer asks for a size
>> less than the current size. A return of success seems more appropriate. We
>> can remove the old behavoir of shrinking the EVD if you like. I'm sure that
>> makes things simpler.
>>
> agree. I see no need to shrink.
>
>> In general, I think we should try to retain the single return point
>> convention (in other words, we should keep the "goto bail" stuff). I admit
>> that we should have made better use of this. For example, there should have
>> been a second "bail" label (bail_unlock?) just before the last call to
>> dapl_os_unlock(). That would have saved us several lines of code in this
>> function.
>>
> agree.
>
>>> if ( evd_qlen > ia_ptr->hca_ptr->ia_attr.max_evd_qlen )
>>> {
>>> - dat_status = DAT_ERROR (DAT_INVALID_PARAMETER,DAT_INVALID_ARG2);
>>> - goto bail;
>>> + return DAT_ERROR (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_TEVD);
>>> }
>>>
>>> dapl_os_lock(&evd_ptr->header.lock);
>>>
>>> - /* Don't try to resize if we are actively waiting */
>>> if (evd_ptr->evd_state == DAPL_EVD_STATE_WAITED)
>>> {
>>> dapl_os_unlock(&evd_ptr->header.lock);
>>> - dat_status = DAT_ERROR (DAT_INVALID_STATE,0);
>>> - goto bail;
>>> + return DAT_ERROR (DAT_INVALID_STATE,0);
>>> }
>>>
>>> pend_cnt = dapls_rbuf_count(&evd_ptr->pending_event_queue);
>>> if (pend_cnt > evd_qlen) {
>>> - dapl_os_unlock(&evd_ptr->header.lock);
>>> - dat_status = DAT_ERROR (DAT_INVALID_STATE,0);
>>> - goto bail;
>>> + dapl_os_unlock(&evd_ptr->header.lock);
>>> + return DAT_ERROR (DAT_INVALID_STATE,0);
>>> + }
>>> +
>>> + if (DAT_SUCCESS != dapls_ib_cq_resize(evd_ptr->header.owner_ia,
>>> + evd_ptr,
>>> + &evd_qlen))
>>> + {
>>> + dapl_os_unlock(&evd_ptr->header.lock);
>>> + return DAT_ERROR
>>> (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_MEMORY);
>>> + }
>>> +
>>> + /* Allocate EVENTs */
>>> + events = (DAT_EVENT *) dapl_os_alloc (evd_qlen * sizeof (DAT_EVENT));
>>> + if (!events)
>>> + {
>>> + dapl_os_unlock(&evd_ptr->header.lock);
>>> + return DAT_ERROR
>>> (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_MEMORY);
>>> }
>>> + event_ptr = events;
>>>
>>> - dat_status = dapls_ib_cq_resize(evd_ptr->header.owner_ia,
>>> - evd_ptr,
>>> - &evd_qlen);
>>> - if (dat_status != DAT_SUCCESS)
>>> + /* allocate free event queue */
>>> + if (DAT_SUCCESS != dapls_rbuf_alloc (&free_event_queue, evd_qlen))
>>> {
>>> + dapl_os_free(event_ptr, evd_qlen * sizeof (DAT_EVENT));
>>> dapl_os_unlock(&evd_ptr->header.lock);
>>> - goto bail;
>>> + return DAT_ERROR
>>> (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_MEMORY);
>>> }
>>>
>>> - dat_status = dapls_evd_event_realloc (evd_ptr, evd_qlen);
>>> - if (dat_status != DAT_SUCCESS)
>>> + /* allocate pending event queue */
>>> + if (DAT_SUCCESS != dapls_rbuf_alloc (&pending_event_queue, evd_qlen))
>>> {
>>> + dapl_os_free(event_ptr, evd_qlen * sizeof (DAT_EVENT));
>>> dapl_os_unlock(&evd_ptr->header.lock);
>>> - goto bail;
>>> + return DAT_ERROR
>>> (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_MEMORY);
>>> }
>>>
>>> + for (i = 0; i < pend_cnt; i++)
>>> + {
>>> + orig_event = dapls_rbuf_remove(&evd_ptr->pending_event_queue);
>>> + if (orig_event == NULL) {
>>> + dapl_dbg_log (DAPL_DBG_TYPE_ERR, " Inconsistent event
>>> queue\n");
>>> + dapl_os_free(event_ptr, evd_qlen * sizeof (DAT_EVENT));
>>> + dapl_os_unlock(&evd_ptr->header.lock);
>>> + return DAT_ERROR
>>> (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_MEMORY);
>>> + }
>>> + memcpy(event_ptr, orig_event, sizeof(DAT_EVENT));
>>> + if (DAT_SUCCESS != dapls_rbuf_add(&pending_event_queue,
>>> event_ptr)) {
>>> + dapl_os_free(event_ptr, evd_qlen * sizeof (DAT_EVENT));
>>> + dapl_os_unlock(&evd_ptr->header.lock);
>>> + return DAT_ERROR
>>> (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_MEMORY);
>>> + }
>>> + event_ptr++;
>>> + }
>>> +
>>> + for (i = pend_cnt; i < evd_qlen; i++)
>>> + {
>>> + if (DAT_SUCCESS != dapls_rbuf_add(&free_event_queue,
>>> + (void *) event_ptr)) {
>>> + dapl_os_free(event_ptr, evd_qlen * sizeof (DAT_EVENT));
>>> + dapl_os_unlock(&evd_ptr->header.lock);
>>> + return DAT_ERROR
>>> (DAT_INSUFFICIENT_RESOURCES,DAT_RESOURCE_MEMORY);
>>> + }
>>> + event_ptr++;
>>> + }
>>> +
>>> + dapls_rbuf_destroy (&evd_ptr->free_event_queue);
>>> + dapls_rbuf_destroy (&evd_ptr->pending_event_queue);
>>> + if (evd_ptr->events)
>>> + {
>>> + dapl_os_free (evd_ptr->events, evd_ptr->qlen * sizeof
>>> (DAT_EVENT));
>>> + }
>>> + evd_ptr->free_event_queue = free_event_queue;
>>> + evd_ptr->pending_event_queue = pending_event_queue;
>>> + evd_ptr->events = events;
>>> + evd_ptr->qlen = evd_qlen;
>>> +
>>
>>
>>
>> Can you hide this in dapls_ib_cq_resize()?
>>
> which part?
All the new stuff. :) Also some of this used to
be done in dapls_evd_event_realloc(). I don't see you using that
function anymore.
I assume that the need to restructure this function was based on
OpenIB limitations. My preference would be to hide those in the
dapls_ib_* code.
>
>>
>>> dapl_os_unlock(&evd_ptr->header.lock);
>>>
>>> - bail:
>>> - return dat_status;
>>> + dapl_dbg_log (DAPL_DBG_TYPE_RTN,
>>> + "dapl_evd_resize returns DAT_SUCCESS\n");
>>> +
>>> + return DAT_SUCCESS;
>>> }
>>>
>>> /*
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>
More information about the general
mailing list