[openib-general] RFC: detecting duplicate MAD requests

Tue Jun 13 11:05:23 PDT 2006

>There are architected ways to do that. There's busy for MADs which could
>be used for some MADs. For RMPP, would the transfer be ABORTed ? I don't
>think you can switch to BUSY in the middle (but I'm not 100% sure). I
>don't know how this limit is being used exactly, but it might be best if
>the RMPP receive were treated as 1 MAD regardless of of how many
>segments it was.

Maybe I should back-up some here.  There are a couple problems that I'm trying
to solve, but the main goal is to prevent sending duplicate responses.  I'd like
to do this by detecting and dropping duplicate requests.

To detect a duplicate request, my proposal is to move completed MADs to a
"done_list".  Newly received MADs would also check the done_list to determine if
the MAD is a duplicate.  When a user sends a response MAD, a check would be made
against the done_list for a matching request that has not generated a response
yet.  If one is not found, then the send would be failed.

Received MADs would be removed from the done_list when they are freed.  My guess
is that for kernel clients, the changes would probably be minimal.  For usermode
clients, the problem is more difficult, since we cannot trust usermode clients
to generate responses correctly, and there's no free_mad call that maps to the
kernel.

One of the ideas then, is for the kernel umad module to learn which MADs
generate responses.  It would do this by updating an entry to a table whenever a
response MAD is generated.  A received MAD would check against the table to see
if a response is supposed to be generated.  If not, then the MAD would be freed
after userspace claims it.  If a response is expected, then the MAD would not be
freed until the response was generated.

Assuming minimal hard-coding of which methods are requests, a client would drop
only about 1 MAD per method during start-up.  Considering most requests are not
sent reliably, this shouldn't be a big issue.  (In fact, outside of a
MultiPathRecord query, I don't believe any requests are sent reliably.)  And I
would argue that even if a request has been acknowledged, the sender of the
request would still need to deal with the case that no response is ever
generated.

If this approach were taken, then, it brings up the issue that MADs are being
stored in the kernel waiting for a response.  But what if a response is never
generated?  This problem is somewhat related to MADs being queued in the kernel,
but the userspace app doesn't call down to receive them.  Ideally, we could come
up with a single solution to both problems, but that may not be possible.

My current thoughts on how to handle requests are to time when each request MAD
is received, and queue it.  Once the queue is full, if another request is
received, it would check the MAD at the head of the queue.  If the MAD at the
head was older than some selected value (say 20 seconds), it would be bumped
from the queue, and the new request would be added to the tail.

- Sean