[ofw] [RFC] ib cm: export CM only interface

Tue Nov 18 14:22:59 PST 2008

>It should be per CEP - the MAD callback is done in the context of the QP1
>receive CQ callback at DISPATCH_LEVEL.  You could have multiple callbacks for
>different CEPs if you have multiple ports active.  I don't remember off the top
>of my head if the QP1 manager has a CQ per direction, or a single CQ for send
>and receives.  If they're separate then you could have multiple callbacks (for
>different CEPs) simultaneously for a single port.

If the MAD dispatch callbacks are serialized with respect to a single HCA port,
all threading issues are greatly reduced.  As long as we're in the dispatch
routine, no additional MADs will be processed by the CEP manager for a
connecting CEP.  Listening CEPs could be changed to have simultaneous callbacks,
but I think listening callbacks are serialized now because of the internal MAD
queuing.  Basically, the thread switching of the higher level IBAL CM code
causes complications, which required the poll MAD support.  The new interface
avoids these sort of issues.

The code now receives a MAD, changes the CEP state, and reports the event to the
user.  The only issue that I see is how destroy may be implemented.

>That's a problem with your callback model.  If you let the client call down
>when ready to process the next MAD you'd be fine.  The CEP manager has to queue
>MADs already, so this wouldn't require much of a change.

Ignore the fact that a MAD is used.  What's occurring is that an event has
occurred on the CEP, and it is being reported to the user.  The event data just
happens to be contained in a MAD.  A user can defer looking at the event, but it
still occurred and after the event is reported, additional events can occur to
change the state of the CEP.  In the bigger picture, the state of the connection
has changed regardless of how quickly the user wants to examine the events.

Using polling to report events is fine for a user to kernel interface, but this
is a kernel interface.  Queuing of events should be pushed to the ULPs that need
it.

>The CEP manager doesn't use the AL object stuff because it doesn't need it: the
>sync destroy wasn't needed since the QP/Listen already implemented it.  The
>QPs/Listens end up hanging out until the CEP is destroyed and invokes the
>destroy callback (which is just deref_al_obj).

gurgle... There is locking that's shared between invoking a callback and
destruction.  It may be possible to move those locks around to ensure that a
callback doesn't occur after destroy is called.

>The callback is invoked in the context of the MAD completion handler.  However
>it's protected by the CEP's lock (at least the information about whether the
>CEP has been signaled or not) so you won't ever have two simultaneous callbacks
>for the same CEP.  You might have simultaneous callbacks for different CEPs.

My concern is that listen callbacks are serialized with respect to the listen,
and not the connecting CEP.  If the user sends a REP from the callback, then
they can receive another callback on that CEP before the first one returns.
However, it looks like the MAD dispatch callbacks handle this case.

>So you lose the automatic capping of the initiator depth to the CA's
>capabilities.  This means that the user will need to query the CA and adjust.
>Or were you planning on capping still and just modifying the received REQ MAD?

IMO, the low level CM should not adjust user specified values, especially in a
way that's hidden from the user.  I would rather see the CM fail requests that
cannot be supported by the local HCA.  The user already needs to set the number
of SGEs, send/receive depth, CQ depth, etc. correctly.  If the initiator depth
is lower than what the user wants, the user may decide to use multiple
connections, rather than one with lower capabilities.

- Sean