[ofw] [RFC] ib cm: export CM only interface

Tue Nov 18 07:39:39 PST 2008

Note that the following comments are all made based on my memory of the code - I don't have the current CEP manager code in front of me, so some things may be wrong.

> static NTSTATUS
> convert_ib_status(ib_api_status_t status)

There's already a function that converts ib_status values to NTSTATUS ones, that handles all ib_status values.  I don't have the code handy, but I believe it is in al_dev.c, and I don't remember the name offhand, but it had the word 'ntstatus' in it so a search should be fruitful.

<snip...>

> static void
> cm_cep_handler(const ib_al_handle_t h_al, const net32_t cid)
> {
>         void                            *context;
>         net32_t                         new_cid;
>         ib_mad_element_t        *mad;
>         iba_cm_id                       *id, *listen_id;
>
>         while (al_cep_poll(h_al, cid, &context, &new_cid, &mad) ==
> IB_SUCCESS) {
>
>                 if (new_cid == AL_INVALID_CID) {
>                         id = (iba_cm_id *) context;
>                 } else {
>                         listen_id = (iba_cm_id *) context;
>
>                         id = ExAllocatePoolWithTag(NonPagedPool,
> sizeof(iba_cm_id), 'mcbi');
>                         if (id == NULL) {
>                                 al_destroy_cep(gh_al, &new_cid, FALSE);

Note that all new CEPs should probably be created in the CEP manager with a NULL callback.  Since the CEPs inherit the listen CEPs callback, I think it's possible a callback for a new CEP would be invoked (say a REJ due to timeout) before the new CEP was retrieved.  If the callback pointer was NULL until the REP call, you would be safe.

>                                 ib_put_mad(mad);
>                                 continue;
>                         }
>
>                         id->context = listen_id;
>                         id->callback = listen_id->callback;
>                         id->cid = new_cid;
>                 }
>
>                 id->callback(id, mad->p_mad_buf);

How does someone that's get the new CM ID before calling iba_cm_rep?

What happens if they need the MAD contents in a different thread context - do they have to allocate/copy?  Why not just hand them the MAD and have them be responsible for freeing it - this lets them store it if they need it while changing to a passive level thread context (if they need it).  I think this would be better, even if you end up with a wrapper for ib_put_mad.

How is your IOCTL interface going to work?  Will it have an event that will give it the MAD too?  Will the user-mode library be callback driven, or event driven?

The al_cep_get_pdata function was added to so that the private data could be retrieved after a REQ received, but in an entirely different call context.  In the ND case (and I think what you defined for WinVerbs), the client gets an event on their listen object that completes directly to the user (Win32 overlapped operation).  The client then needs to retrieve the information from the received MAD (private data, responder resources, initiator depth) and this was done via al_cep_get_pdata.

>                 ib_put_mad(mad);
>         }
> }
>
> static NTSTATUS
> cm_create_id(void (*callback)(iba_cm_id *p_id, ib_mad_t *p_mad),
>                          void *context, iba_cm_id **pp_id)
> {
>         iba_cm_id               *id;
>         ib_api_status_t ib_status;
>
>         id = ExAllocatePoolWithTag(NonPagedPool, sizeof(iba_cm_id),
> 'mcbi');
>         if (id == NULL) {
>                 return STATUS_NO_MEMORY;
>         }
>
>         id->callback = callback;
>         id->context = context;
>
>         ib_status = al_create_cep(gh_al, cm_cep_handler, id, NULL, &id-
> >cid);

You'll probably want a destroy callback here, so that you can either block or release a reference on your ID structure when you destroy its underlying CEP.

>         if (ib_status != IB_SUCCESS) {
>                 ExFreePool(id);
>                 return convert_ib_status(ib_status);
>         }
>
>         *pp_id = id;
>         return STATUS_SUCCESS;
> }
>
> static void
> cm_destroy_id(iba_cm_id *p_id)
> {
>         al_destroy_cep(gh_al, &p_id->cid, FALSE);

The al_destroy_cep function does not block, so you could receive a callback after you free the ID.  You need a way to mark the ID freed so that the handler doesn't invoke the callback.  You then need to do reference counting on your ID structures so that they can be freed after the CEP manager is done with them.  Alternatively, you can allocate an event/block until the CEP is freed (your destroy callback is invoked).

>         ExFreePool(p_id);
> }

<snip...>

> static NTSTATUS
> cm_get_qp_attr(iba_cm_id *p_id, ib_qp_state_t state, ib_qp_mod_t
> *p_attr)
> {
>         ib_api_status_t ib_status;
>
>         switch (state) {
>         case IB_QPS_INIT:
>                 ib_status = al_cep_get_init_attr(gh_al, p_id->cid,
> p_attr);
>                 break;
>         case IB_QPS_RTR:
>                 ib_status = al_cep_get_rtr_attr(gh_al, p_id->cid,
> p_attr);

How will you handle the passive side accepting?  The QP attributes can be changed by sending the REP - the CEP manager splits the REP into a pre- and send- calls so that the updates to the CEP's QP attributes can happen in the pre- call (which returns the attributes), so the client can do the RTR transition, and then the REP can be sent by the send- call.  Is the expectation here that the client will call get_rtr_attr, make the changes they intend to make with the REP manually in the returned QP attribute structure, then call iba_cm_rep?

>                 break;
>         case IB_QPS_RTS:
>                 ib_status = al_cep_get_rts_attr(gh_al, p_id->cid,
> p_attr);
>                 break;
>         default:
>                 return STATUS_INVALID_PARAMETER;
>         }
>
>         return convert_ib_status(ib_status);
> }