[openib-general] Re: [PATCH] ib_mad: prevent duplicate outstanding MADtransactions with same TID

Jack Morgenstein jackm at mellanox.co.il
Wed Feb 22 22:51:04 PST 2006


On Wednesday 22 February 2006 18:36, Jack Morgenstein wrote:
> The issue is complex, and two-fold:
>
> A
> --
> 1. We should PREVENT sending a new duplicate identical request MADs
> while the previous MAD has not yet timed out (but allow RMPP ACK/NACK
> packets, which have the identical TID/GID/class as the original request
> packet).
>
> 2. Similarly, we should PREVENT sending a new duplicate RMPP mad from
> sender side (usually an RMPP response) while the previous RMPP session
> is still in progress.
>
> B
> --
> We should ALLOW sending duplicate response MADs (or duplicate RMPP
> response sessions) having the same transaction ID, but going to
> different destinations.
>
> ----
> Regarding A.2 and B:  Normal (non-RMPP) responses do not have timeouts,
> whereas RMPP responses do have timeouts per segment (via the RMPP
> protocol).
> However, these timeouts are visible only after the call to
> ib_post_send_mad() (which is the natural place to put duplication
> detection).
>
> In the current OpenSM implementation, all response MADs are passed from
> user-space to kernel space with a timeout set to zero -- and this
> 0-timeout is passed to ib_post_send_request() by ib_umad_write.
>
> If an RMPP response is indicated, the timeout is changed in mad_rmpp.c,
> send_next_seg() just before calling ib_send_mad().  Thus, when the
> segment is sent and the send_completion is received, the mad transaction
> is transferred to the send wait-queue to await a response packet (since
> the timeout is non-zero at that point).

The reason for this discussion on timeouts is for issue A.1.  If we only do 
the duplication check for MADs with timeouts (i.e., MADs expecting a 
response), we will miss checking RMPP responses (which are sent with 
0-timeout, as they should be -- all the RMPP complexity is, and should be, 
hidden from the sender).

If, however, we add the duplication check for MADs with timeout=0, we'll check  
duplicates (inpropriately) for ALL mads.  This, specifically, will cause 
problems for the RMPP ACK/NACK messages.

The correct condition for checking when sending a MAD is therefore:
If EITHER the timeout specified in the ib_mad_send_buf struct is > 0 ;
OR the packet has RMPP active, but is only a data packet (not a control 
packet), so we will check for RMPP responses.

-- Jack



More information about the general mailing list