[openib-general] [PATCH] ib_cancel_mad API

Wed Sep 29 17:15:43 PDT 2004

> From: Roland Dreier [mailto:roland at topspin.com]
> Sent: Wednesday, September 29, 2004 4:58 PM
> 
>     Fab> If deregistration is synchronous then as long as the MAD
>     Fab> layer keeps reference counts for outstanding sends, the
>     Fab> client does not need to.  The client is guaranteed that their
>     Fab> send callback will not be called after deregistration
>     Fab> completes.
> 
> I agree that deregistering an agent is fine with the current API.
> 
>     Fab> As Sean mentioned, it's simpler for clients to know they will
>     Fab> *always* get a send completion regardless of status.  It
>     Fab> allows them to do all the send completion processing in their
>     Fab> handler, rather than having it split between the send handler
>     Fab> and the cancel logic.
> 
>     Fab> In fact, I'd just rather remove the return value all together
>     Fab> - what can a client do with the return value that they
>     Fab> wouldn't know from the status reported to the send handler?
> 
> The problem with Sean's proposed API for canceling a single MAD send
> is that it's not synchronous.  So clients have to wait for the
> callback of the send they want to cancel.  I agree that as it stands
> the return value is not useful because no matter what it is, a
> completion may or may not come after the cancel call returns.
> 

I think as long as ib_cancel_mad can return before the corresponding send
completes (i.e. return -EBUSY), you have this problem and client must
provide their own synchronization, whether through reference counting or
some other means.

Returning -EBUSY from ib_cancel requires the caller to block until the send
handler is invoked.  This in turn means that there needs to be code so that
the send handler can wakeup the canceling thread once the send is complete.
I don't see the difference between such synchronization requirements to the
client and reference counting.

To solve this, you would need ib_cancel_mad to be a synchronous call that
would block in the -EBUSY case (at which point the return value is also
pointless).  However, making this change requires all callers to be in a
thread context suitable for blocking.  I don't think we want to impose this
sort of requirement for MAD cancellation.  I think such a requirement is
fine for MAD agent destruction though.

So I see two options:
1. Implement reference counting to track your own sends if you plan on
sharing a MAD agent.
2. Don't share MAD agents.

- Fab