[ofw] [PATCH 3/7] ib/cm: export CM only interface

Sean Hefty sean.hefty at intel.com
Tue Jan 20 17:47:26 PST 2009


>So there's a requirement then that the CEP be in the callback then?  It might
>be nice to store the thread handle of the thread invoking the callback and
>assert in this function that the CEP is invoking a callback in this same
>thread.

This is all part of the lowest level CM code.  If you think of the lower CM as
its own module, then the upper interface to that is the proposed interface.  If
it would have been convenient to just pass the CEP object directly in the
callback, I would have done that.  But until all users are migrated to use the
proposed interface, I minimized the code changes.  The shared userspace/kernel
code hinders cleaning up the code paths.

>> I don't think this works with the current compiler.
>
>It should, as it's being done in other parts of the code (HCA driver, IBAT
>library, FMR pool in IBAL, OpenSM)

I tried it and changed in v2 of the patch.

>> What exactly is wrong with the changed code?
>
>1. You assign the CID in the QP after the QP might have initiated destruction,
>leaking the CEP until the app exits.

How does the existing code prevent this?

>2. You don't check that the QP isn't already connected.

I'm trying to have the low level CM code only operate based on the QPN.  It
shouldn't rely on objects to other data structures, since winverbs doesn't have
any of these objects.

>> The al_cep_pre_rep_ex function didn't have any notion that it was
>> dealing with a QP's CID.  It took as input a CID, a pointer to a CID
>> that had to be set to INVALID, and on output copied the input CID to
>> the location referenced.  The behavior of such an interface doesn't
>> make much sense.  The function is trying to do too much.
>
>All the CEP functions that potentially modify the CID stored in the QP take it
>as an in/out parameter (al_create_cep, al_destroy_cep, al_cep_pre_rep,
>al_cep_pre_rep_ex).  This was necessary to work around races with setting the
>CID value that couldn't be done with atomics.
>
>It's fine to change the behavior as long as you don't break the functionality.
>You might be able to take the QP object's lock to protect the QP's CID
>parameter, but then the other functions that take the CID by reference should
>also change.

Again - how is the race prevented?  I don't see any protection against the
situation that you're describing, plus it results in confusing code. 

>> This is confusing.  I think this may fix it:
>
><snip> I don't think either of those code snippets works.
>
>> But I can't follow the referencing counting enough to know if the
>> second option would work.
>
>You must take the reference on the QP for the CID assignment before you assign
>the CID, so that if the QP is destroyed you never risk the ref count hitting
>zero when you're still dealing with the object.  If assigning the CID failed,
>you destroy the CEP and release the reference.  If the assignment succeeded
>(kal_config_cep succeeds) then you don't release the reference because it will
>be released when the CEP is destroyed.  So once you've succeeded calling
>kal_config_cep you don't release the reference, you only destroy the CEP.

The refcount on the QP cannot hit 0 in this call.  There are already multiple
references being held on the QP at this point.  There has to be at least 1 or we
couldn't safely dereference the pointer to start with.

I used the first option of these.  I liked the second one better, but couldn't
convince myself that it would work.

>Blocking destruction can lead to other problems (not to mention imposing
>restrictions on the caller) unless you allow destruction to fail if the object
>is in use (but then the recovery logic is heinous too).  The QP won't get
>destroyed until all IOCTLs have returned, and all references (from async
>operations for example) have also been released.  However there is code that
>initiates destruction, and that is what needs serialization (and then only on
>the members that have potential contention), so that a CEP is destroys and
>aborts any connection establishment that may be in progress in order to give a
>timely destruction of the QP.

We're talking about IOCTL calls.  Blocking destruction is completely harmless
here and imposes no unreasonable restrictions.  (An app that destroys a QP while
they have another downcall in progress is doomed to crash in userspace anyway.)

>In WinVerbs, if you destroy an endpoint, does it disconnect if it's connected,
>or do you not track the CEP to QP relationship?

It simply destroys what maps down to the CEP.  Part of the CEP cleanup process
is to send a DREQ.  No tracking is done to associate a CEP to a QP.  (Consider
that an app can connect QPs without going through the CM at all.)

>See the list of 4 functions that only modify the CID if it has particular
>states.  You can create a lock to protect the CID member of the QP if you'd
>like to move it out of the CEP.  You can't however just move it out without
>providing the same checks/synchronization.
>
>That's what the code does today.  The CID has one initial value that indicates
>that it is invalid, but available: AL_INVALID_CID.  Then there's another
>special value that indicates that the CID is not valid, and cannot be used:
>AL_RESERVED_CID.  Then there are valid CID values that map to CEPs.  The
>locking is provided by the CEP manager.
>
>You could break the state out as a different member in the QP, along with a
>lock, and accomplish the same thing.  You'd have to use a spinlock though as
>the CEP functions can all be called at dispatch level.  If you want to have
>blocking destruction, you can signal an event from the CEP's destroy callback.

I think you're missing that there's nothing that synchronizes access to the QPs
CID field during destruction.  To be more specific:

void
al_destroy_cep(
	IN				ib_al_handle_t
h_al,
	IN	OUT			net32_t* const
p_cid,
	IN				boolean_t
reusable )
{
    net32_t             cid = *p_cid;

the cid was just read, but without any locks being held.  Any protection that
you believe you're getting by passing the QP's CID into other calls doesn't
exist.

- Sean




More information about the ofw mailing list