[ofw] [PATCH 3/7] ib/cm: export CM only interface

Wed Jan 21 09:33:40 PST 2009

>> So there's a requirement then that the CEP be in the callback then? It
>> might be nice to store the thread handle of the thread invoking the
>> callback and assert in this function that the CEP is invoking a
>> callback in this same thread.
>
> This is all part of the lowest level CM code.  If you think of the
> lower CM as its own module, then the upper interface to that is the
> proposed interface.  If it would have been convenient to just pass the
> CEP object directly in the callback, I would have done that.  But
> until all users are migrated to use the proposed interface, I
> minimized the code changes.  The shared userspace/kernel code hinders
> cleaning up the code paths.

The only shared code for the CM is ib_cm_qp.c.  The CEP manager code is not shared at all (ib_cm_cep.c), only it's interface (used by ib_cm_qp.c).

I think the changes you're making are good, it was just hard to follow the long term intent.  If moving everyone above the new CM API is the plan that's great.

>>> What exactly is wrong with the changed code?
>>
>> 1. You assign the CID in the QP after the QP might have initiated
>> destruction, leaking the CEP until the app exits.
>
> How does the existing code prevent this?

All assignments to the CID are done under the CEP manager's lock.

>> 2. You don't check that the QP isn't already connected.
>
> I'm trying to have the low level CM code only operate based on the QPN.
> It shouldn't rely on objects to other data structures, since winverbs
> doesn't have any of these objects.

You should still check (somewhere) that the QPN isn't in use already.  If you don't want this in the CM, you should add a connection state to the connected QP object and track it there.

>>> The al_cep_pre_rep_ex function didn't have any notion that it was
>>> dealing with a QP's CID.  It took as input a CID, a pointer to a
>>> CID that had to be set to INVALID, and on output copied the input
>>> CID to the location referenced.  The behavior of such an interface
>>> doesn't make much sense.  The function is trying to do too much.
>>
>> All the CEP functions that potentially modify the CID stored in the
>> QP take it as an in/out parameter (al_create_cep, al_destroy_cep,
>> al_cep_pre_rep, al_cep_pre_rep_ex).  This was necessary to work
>> around races with setting the CID value that couldn't be done with atomics.
>>
>> It's fine to change the behavior as long as you don't break the
>> functionality. You might be able to take the QP object's lock to
>> protect the QP's CID parameter, but then the other functions that take
>> the CID by reference should also change.
>  Again - how is the race prevented?  I don't see any protection against
> the situation that you're describing, plus it results in confusing code.
>
>>> This is confusing.  I think this may fix it:
>>
>> <snip> I don't think either of those code snippets works.
>>
>>> But I can't follow the referencing counting enough to know if the
>>> second option would work.
>>  You must take the reference on the QP for the CID assignment before
>> you assign the CID, so that if the QP is destroyed you never risk the
>> ref count hitting zero when you're still dealing with the object.  If
>> assigning the CID failed, you destroy the CEP and release the
>> reference.  If the assignment succeeded (kal_config_cep succeeds) then
>> you don't release the reference because it will be released when the
>> CEP is destroyed.  So once you've succeeded calling kal_config_cep you
>> don't release the reference, you only destroy the CEP.
>
> The refcount on the QP cannot hit 0 in this call.  There are already
> multiple references being held on the QP at this point.  There has to be
> at least 1 or we couldn't safely dereference the pointer to start with.

Sure, but then your references are interdependent, when it's a lot easier to maintain if they are self-standing.

> I used the first option of these.  I liked the second one better, but
> couldn't convince myself that it would work.
>
>> Blocking destruction can lead to other problems (not to mention
>> imposing restrictions on the caller) unless you allow destruction to
>> fail if the object is in use (but then the recovery logic is heinous
>> too).  The QP won't get destroyed until all IOCTLs have returned, and
>> all references (from async operations for example) have also been
>> released.  However there is code that initiates destruction, and that
>> is what needs serialization (and then only on the members that have
>> potential contention), so that a CEP is destroys and aborts any
>> connection establishment that may be in progress in order to give a
> timely destruction of the QP.
>
> We're talking about IOCTL calls.  Blocking destruction is completely
> harmless here and imposes no unreasonable restrictions.  (An app that
> destroys a QP while they have another downcall in progress is doomed
> to crash in userspace anyway.)

The blocking should be done in the IOCTL handler, not at the lowest level. That's my objection.  I have the same objection to all the blocking verbs, I think the design is crap.  You should design the lower levels of the stack to support both kernel and user clients.  Kernel clients cannot always block, so make the design flexible and put the blocking policy at the highest level possible.

> I think you're missing that there's nothing that synchronizes access
> to the QPs CID field during destruction.  To be more specific:
>
> void
> al_destroy_cep(
>         IN                              ib_al_handle_t h_al, IN      OUT
>                             net32_t* const p_cid, IN
>                  boolean_t
> reusable )
> {
>     net32_t             cid = *p_cid;
>
> the cid was just read, but without any locks being held.  Any
> protection that you believe you're getting by passing the QP's CID
> into other calls doesn't exist.

Yep, that's a (minor) bug alright, though no code assigns the QP's CID member except through the CEP calls, so you would never have a problem unless a kernel client behaves badly.  Specifically, the only place that the QP's CID is assigned from AL_INVALID_CID to a valid CID is in al_create_cep and al_cep_pre_rep(_ex), under the protection of the CEP lock.  The only place it is changed from a valid value to either AL_INVALID_CID or AL_RESERVED_CID is again under the lock.

So while I agree with you that the design could be cleaner, I don't agree that it's broken in the current code, and I'll still assert that it is broken in your patch.

-Fab