[openib-general] RFC: detecting duplicate MAD requests

Tue Jun 13 14:15:44 PDT 2006

On Tue, 2006-06-13 at 14:05, Sean Hefty wrote:
> >There are architected ways to do that. There's busy for MADs which could
> >be used for some MADs. For RMPP, would the transfer be ABORTed ? I don't
> >think you can switch to BUSY in the middle (but I'm not 100% sure). I
> >don't know how this limit is being used exactly, but it might be best if
> >the RMPP receive were treated as 1 MAD regardless of of how many
> >segments it was.
> 
> Maybe I should back-up some here.  There are a couple problems that I'm trying
> to solve, but the main goal is to prevent sending duplicate responses.  I'd like
> to do this by detecting and dropping duplicate requests.
> 
> To detect a duplicate request, my proposal is to move completed MADs to a
> "done_list".  Newly received MADs would also check the done_list to determine if
> the MAD is a duplicate.  When a user sends a response MAD, a check would be made
> against the done_list for a matching request that has not generated a response
> yet.  If one is not found, then the send would be failed.
> 
> Received MADs would be removed from the done_list when they are freed.  My guess
> is that for kernel clients, the changes would probably be minimal.  For usermode
> clients, the problem is more difficult, since we cannot trust usermode clients
> to generate responses correctly, and there's no free_mad call that maps to the
> kernel.
> 
> One of the ideas then, is for the kernel umad module to learn which MADs
> generate responses.  It would do this by updating an entry to a table whenever a
> response MAD is generated.  A received MAD would check against the table to see
> if a response is supposed to be generated.  If not, then the MAD would be freed
> after userspace claims it.  If a response is expected, then the MAD would not be
> freed until the response was generated.
> 
> Assuming minimal hard-coding of which methods are requests, a client would drop
> only about 1 MAD per method during start-up.

Is this only the new methods which are not hard coded ? Would this
invoke a timeout (and hopefully retry) ?

> Considering most requests are not
> sent reliably, this shouldn't be a big issue.  (In fact, outside of a
> MultiPathRecord query, I don't believe any requests are sent reliably.)

If you mean sent via RMPP, then yes, only GetMulti is sent this way.

> And I
> would argue that even if a request has been acknowledged, the sender of the
> request would still need to deal with the case that no response is ever
> generated.

Are you referring to a request being acknowledged but the response is
not sent (yet) ?

> If this approach were taken, then, it brings up the issue that MADs are being
> stored in the kernel waiting for a response.  But what if a response is never
> generated?  This problem is somewhat related to MADs being queued in the kernel,
> but the userspace app doesn't call down to receive them.  Ideally, we could come
> up with a single solution to both problems, but that may not be possible.
> 
> My current thoughts on how to handle requests are to time when each request MAD
> is received, and queue it.  Once the queue is full, if another request is
> received, it would check the MAD at the head of the queue.  If the MAD at the
> head was older than some selected value (say 20 seconds), it would be bumped
> from the queue, and the new request would be added to the tail.

For RMPP, this time should start when the last segment is received. Is
that how you would envision it working ?

I'm also not sure what the right timeout value would be for this. Where
did 20 seconds come from ?

-- Hal

> - Sean