[openib-general] Re: Dual Sided RMPP Support as well as OpenSM Implications

Mon Apr 10 14:59:33 PDT 2006

Hal Rosenstock wrote:
>>>>Node A sends an RMPP message.  This requires normal RMPP processing.
>>>>Node A sends an ACK of the final ACK (I'll call ACK2), giving a new window.
>>>>Node B receives ACKs.
>>>>Node B sends the response.  This requires normal RMPP processing.
>>>>
>>>>From the perspective of node A, the RMPP code only needs to know to send ACK2. 
>>>
>>>There's more to the state machine in turning the direction around in
>>>terms of the sender becoming the receiver. I thought that this is the
>>>"harder" direction change.
>>
>>Can you describe what more is needed that what's listed above?
> 
> I was referring to comparing the direction switch flows (Figure 181 p.
> 791) requires more than switch to DS in Figure 179 p. 787).

It's still not clear to me what's missing from the sequence.  Once node A sends 
ACK2, it should wait until either ACK1 is received again, or the response is 
received.  Upon receiving ACK1, it can resend ACK2.  I might not have mentioned 
this before, but I would have ACK2 carry a window size of 1, which lets us treat 
all received RMPP MADs the same.

>>In terms of compliance, if node A is not, but node B is;
> 
> Is = DS and not is not DS, right ?
> 
> Just out of curiosity, where's the compliance for this ? What are you
> referring to here ?

I mean that node A does not implement DS RMPP, but node B does.

>>Node B cannot send back the response until ACK2 is received.  Since node A does 
>>not understand dual-sided RMPP, it will not send ACK2.  Node B will never send 
>>the response.
> 
> Correct. It would time out. But wouldn't it be better if the transaction
> were aborted with some explicit status for this ?

Are you asking for an explicit status indicating that ACK2 was not received?  I 
guess this could be added, but node B should not make any assumptions about the 
reason for the timeout, such as node A doesn't support DS RMPP.  If node A 
doesn't support DS RMPP, I don't know that it should expect a MultiPathRecord 
query to work.

>>If node A is, but node B is not:
>>Node A will send ACK2, which node B should drop.
> 
> Yes, figure 179 for receiver termination flow (IsDS false direction)
> shows that packet as discarded with an Abort (BadT) sent.

If ACK2 matches with the received request, then wouldn't that transaction be 
aborted?  Does this mean that both nodes must either be DS RMPP compliant, or 
non-compliant for communication to work?

>>Node B will send an RMPP message assuming an initial window size of 1.  
>>If node A had set the window 
>>larger, it may delay the ACK of segment 1.  Node B will eventually timeout and 
>>resend segment 1.  Most likely, this will cause node A to ACK segment 1, which 
>>will update the window size at node B.
> 
> I'm not following you on this part.

I'm just trying to determine what could happen if a non-compliant implementation 
tried talking to a compliant implementation.  And now I'm leaning towards them 
being unable to communication.

>>>> It can do this based on the method, or per transaction if directed by the client.
>>>
>>>Yes; I was thinking of class/method based approach for this.
>>
>>Currently, only a MultiPathRecord query requires this.  Why not limit dual-sided 
>>RMPP to _only_ this request?  
> 
> 
> That would work for now. One future issue would be vendor class 2 needs
> here.

What I'm suggesting is that we limit "sender-initiated double-sided" RMPP 
transfers to only MultiPathRecord.  Vendor class 2 would simply use two 
"sender-initiated transfers".

>>All other queries can just use an RMPP message one 
>>direction, followed by an RMPP message in the other direction.
> 
> 
> I don't understand what you mean here. That's not the way it works from
> my understanding. If both the request and response are RMPP messages,
> isn't this dual sided ?

If it's a vendor defined MAD, can't we control the behavior and treat this as 
two Sender-Initiated Transfers?  In 13.6.6.3, we have:

It is also possible for a single transaction to involve an RMPP transfer sent
in one direction followed by another RMPP transfer in the other direction... 
This *may* be accomplished as follows:

My interpretation is that we're not restricted to using this.

> I think the issue is turning things around but I'm not positive. I was
> wondering about this in a slightly different way: as to why the
> direction switch ? My initial foray into this area was to do just what
> you said: two single sided RMPP transfers in opposite direction. In my
> simple test case, the request was short (1 MAD) but that could be
> changed. I haven't figured out the reason for the turnaround ACK but I
> know the people who architected this although most have left the group
> and am quite confident that this wouldn't just be there if it weren't
> needed. (I'll eat my words later if necessary :-)

(rant)  IMO, the entire RMPP architecture is ridiculous.  Segmentation and 
reassembly information is embedded in the middle of user data, with timeout 
constraints that would take a half dozen queries to calculate.  So I'm not 
confident that this is needed at all.

The only benefit that I see is that the initial window size could be larger than 
1, which has a potential to provide for better latency.  DS RMPP requires the 
same number of MADs as two single sided RMPP transfers, so even the potential 
gain seems fairly small.

>>>>Node B is more complex.  It must now wait for ACK2, using timeout and retries of 
>>>>ACK1 until ACK2 is received.  And the response that will be generated by the 
>>>>client must be delayed until that ACK2 is received.
>>>
>>>
>>>Yes but isn't much of this already needed for the normal termination
>>>case or is that part not implemented yet ?
>>
>>No - ACK2 is a new message unique to dual-sided RMPP transfers (an ACK of an ACK).
> 
> 
> We're talking about Figure 179, right ? If so, most of that needs to be
> there already down to the Type decision (without the ACK direction
> implemented).
> 
> Yes, ACK2 is new but this doesn't seem like much to add. The delay of
> the client response would also be "new" and that seems harder.

I agree.  Adding in ACK2 shouldn't be that difficult, but does require knowing 
if a given transaction (class/method) uses DS RMPP.  The delay on the send-side 
is already there, in waiting for the response.  The timeout of the RMPP context 
on the receive side is where the difficulty lies, but I think we can avoid this 
difficulty simply by passing NWL up to the client, and having them return it on 
the response.

If we want to support DS RMPP for more than just MultiPathRecord, it seems that 
we need some sort of class/method mapping, which would require changing the 
kernel MAD API.

- Sean