[openib-general] Re: ib_mad: Scenarios for returning posted send MADs

Mon Oct 4 10:52:29 PDT 2004

On Mon, 04 Oct 2004 13:26:42 -0400
Hal Rosenstock <halr at voltaire.com> wrote:

> There are two lists of posted send MADs: (1) a list of posted sends for
> the port, and (2) another list per MAD agent. When a send is first
> posted, it is placed on both lists until the send completion occurs and
> then is removed from the port send list. The handling of the agent send
> list is based on whether there is a timeout specified or not.

This is correct.  The list per MAD agent is intended for timeouts and RMPP handling.  Without RMPP, the list of posted sends per port matches the MAD agent list.  With RMPP, a send at the MAD agent level may result in posting multiple work requests to the port layer.

> 1. In the case that a client unregisters with the MAD layer, there is
> code which cleans up the agent send list. However, it does not appear to
> me that if the send completion occurs after the deregistration that this
> completion is thrown away properly but rather a callback may be
> performed. Did I miss something here ?

A reference on the MAD agent is taken whenever a work request is posted to the QP.  An additional reference is taken on the MAD agent if the MAD has a timeout, indicating that a response MAD is expected.  When RMPP is added, a single send may result in multiple references being taken on the MAD agent.

The reference per work request is not released until the work request complete.  The reference for the response is not released until the response has been received, the request times out, or is canceled.

When a client deregisters, MADs waiting for responses are canceled.  This decrements their reference counts.  If the MAD had no other references, then it is done and may be completed.  If it still has references, this indicates that it has active work requests on the QP that must complete before the send MAD can complete.

This is why the deregistration code decrements the reference count, then checks the reference count before flushing the request.

> 2. Another scenario for this is on WC errors which currently attempt to
> restart the port. I am not sure all WC errors should do this. Perhaps
> only IB_WC_FATAL_ERR and IB_WC_GENERAL_ERR. 

My thought is that work requests that result in a failure should be completed in error from the port layer to the MAD agent.  The port layer _could_ then restart operations with the next work request, and the MAD agent would complete the send MAD to the user in error.

Of course, throwing RMPP into this complicates the matter, since the work request immediately behind the one causing the failure might be another request associated with the same RMPP MAD, which may cause another failure...

It would help in this case for the port layer code just return completions for all queued work requests to the MAD agents, and let the MAD agent code deal with the issue.

> 3. The final scenario is board (not currently possible) or module
> removal. My concern here is about potential send callbacks (indicating
> FLUSHED) to a potentially stale MAD agent. When the module is removed
> non forceably, the clients (upper layer modules) would need to be
> removed first, which should cause the proper deregistration (and these
> MADs would be cancelled so there would be none to cleanup). I am not
> sure what the rules for proper behavior are on forceable module removal.
> Board removal would be similar to this (the forceable module removal
> case).

Deregistration is a synchronous process, so will wait until all send MADs have completed.  If this isn't happening, then the referencing counting is off somewhere.

- Sean