[openib-general] Re: [PATCH] ib_mad: prevent duplicate outstanding MADtransactions with same TID

Jack Morgenstein jackm at mellanox.co.il
Wed Feb 22 08:36:05 PST 2006


The issue is complex, and two-fold:

A
--
1. We should PREVENT sending a new duplicate identical request MADs
while the previous MAD has not yet timed out (but allow RMPP ACK/NACK
packets, which have the identical TID/GID/class as the original request
packet).

2. Similarly, we should PREVENT sending a new duplicate RMPP mad from
sender side (usually an RMPP response) while the previous RMPP session
is still in progress.

B
--
We should ALLOW sending duplicate response MADs (or duplicate RMPP
response sessions) having the same transaction ID, but going to
different destinations.

----
Regarding A.2 and B:  Normal (non-RMPP) responses do not have timeouts,
whereas RMPP responses do have timeouts per segment (via the RMPP
protocol).
However, these timeouts are visible only after the call to
ib_post_send_mad() (which is the natural place to put duplication
detection).

In the current OpenSM implementation, all response MADs are passed from
user-space to kernel space with a timeout set to zero -- and this
0-timeout is passed to ib_post_send_request() by ib_umad_write.

If an RMPP response is indicated, the timeout is changed in mad_rmpp.c,
send_next_seg() just before calling ib_send_mad().  Thus, when the
segment is sent and the send_completion is received, the mad transaction
is transferred to the send wait-queue to await a response packet (since
the timeout is non-zero at that point).

In order to comply with all the restrictions above, we need to do the
following:

When SENDING:
	If RESPONSE bit of method is set:
		Need to check TID/GID/class of all responses in list to
verify
		that this is not a duplicate.
	Otherwise:
		Need to check TID/class of all requests in list.

	NOTE:  Currently, struct ib_mad_send_wr_private holds only the
address 
		handle pointer, NOT the address handle attributes.  We
need the 
		AH attribute data to check GID, LID, and grh.  To
extract this
		Info we can either add it to the private struct
(requiring 
		changes in ib_create_send_mad, and affecting lots of
code),
		or we can change the verb ib_query_ah() to be mandatory
(it is 
		optional in the IB Spec).

When RECEIVING:
	If RESPONSE bit is set:
		Need to check TID/class against outstanding requests.
	Otherwise:
		Need to check TID/GID/class against outstanding
responses (RMPP)
		GID is important here, because responder may have
several
		RMPP sessions active with same TID, but involving
different
		Destination hosts.

Comments? (especially regarding either requiring ib_query_ah vs.
impacting 
lots of existing code)

Jack



-----Original Message-----
From: Sean Hefty [mailto:mshefty at ichips.intel.com] 
Sent: Monday, January 23, 2006 11:08 PM
To: Michael S. Tsirkin
Cc: Jack Morgenstein
Subject: Re: [openib-general] Re: [PATCH] ib_mad: prevent duplicate
outstanding MADtransactions with same TID

Michael S. Tsirkin wrote:
> I think you are right.
> Hmm, how come mad.c does request/response matching simply by calling
> ib_find_send_mad which only gets a tid?
> 

ib_find_send_mad checks against request MADs only - those with a
timeout.

We have a more complicated issue here.  With a response MAD, we can have

multiple TIDs that are the same, as long as they're going to different 
destinations (mgmt class or dlid/dgid).

So, request MADs have a limitation, whereas response MADs don't?

- Sean



More information about the general mailing list