[openib-general] RFC: detecting duplicate MAD requests

Fri Apr 28 17:13:48 PDT 2006

On Fri, Apr 28, 2006 at 03:20:13PM -0700, Sean Hefty wrote:

> I'd like to propose that the MAD layer detect duplicate requests.

Sean,

You can't add this kind of thing piecemeal to a protocol and have it
work. If the sender doesn't see a response (perhaps the response was
lost, or was slow coming), and sends another MAD, this 2nd MAD will
have a different sequence number. How does the recipient know it's the
same request?  If the response was lost the first time, eating the 2nd
MAD without sending a response will result in another timeout and a
3rd MAD... so maybe the recipient remembers the response and sends it
again. Will that work? Well, no, it's not guaranteed, because the
sender may reject a stale response received after sending the 2nd
MAD...

Really, it's up to the MAD client to deal with duplicates in its own
way.

And yes, this class of issues shows up in practice. Ask anyone who's
ever worked on a large distributed system. "Execute exactly once"
semantics require end-to-end design.

-- greg