[openib-general] Send Side RMPP and OpenSM GetTableResp

Sean Hefty sean.hefty at intel.com
Tue May 31 19:05:37 PDT 2005


>> >            <--    SA GetTableResp
>> >
>> >					RMPP flags 0x05 (Data, Last)
>> >					SegmentNumber 4
>> >					PayloadLength 0x34
>> >					TID 8
>> >SA GetTable -->
>> >RMPP flags 0x02 (ACK)
>> >SegmentNumber 1
>> >NewWindowLast 6
>> >TID 8
>>
>> This segment number is off - not sure why.
>
>It is off in that the 3 segments just sent are not acknowledged but it
>is legal to acknowledge what you have already received. This does not
>violate anything.

The RMPP implementation sends an ACK under the following conditions:

* Upon completion of a received datagram.
* If a duplicate segment is receive.
* After all segments of the current window are received
  (including the initial window)

So, this ACK isn't violating the protocol, but I don't see which of these
cases the ACK matches up against in the implementation.

>> It could indicate that segment 2 was lost,
>
>That's one possibility but I doubt it is getting lost.
>
>> or that its processing came after that of a later segment.

After re-examining the RMPP code, the implementation doesn't automatically
send an ACK just because a segment is processed out of order.  It tries to
be intelligent about it in case receive processing is occurring in multiple
threads.

>The gap was 769.912 usec. Not sure whether this corresponds to any IBTA
>timeout. Is this the hardcoded timeout that is used ?

RMPP uses 40 seconds to complete a receive.  The sender uses a 2 second
timer to wait for an ACK before resending segments.  (This is recently
reduced from 5 seconds.)  The receiver uses a 10 second timer to maintain
state after completing a receive in order to re-generate lost final ACKs.

>>   Regardless what went wrong on the SA
>> side, the client needs to be able to deal with it.
>
>This applies in both directions but in this case I think you mean the
>other direction (whatever went wrong on the SA client side the SA needs
>to be able to deal with it).

Obviously all bugs need to be fixed.  I was simply trying to state that the
receiving side must be able to handle a buggy transmitter without adversely
affecting the system.

>> >            <--    SA GetTableResp
>> >					RMPP flags 0x01 (Data)
>> >					SegmentNumber 5
>> >					PayloadLength 0x34
>> >					TID 8
>>
>> This should not occur.  The maximum segment number sent should have
>stayed
>> at 4.  I guess one area to check is to make sure that the PayloadLength
>in
>> the original MAD is set correctly.  I do not know what would happen if it
>> were set incorrectly.  There could also be an error in how RMPP
>calculates
>> the number of segments that will be sent.
>
>It does look like it is trying to resend the last (at least based on the
>PayloadLength) ? I will find where to instrument this in the code.

The code on the send side calculates the total segment number using both the
PayloadLength and sge.length field.  If either is off, the sender side could
probably be thrown off in its calculations.  Even if this were the case, I
still can't see what would cause segment number 5 to be transmitted...

>> This segment should have been dropped by the client as an invalid segment
>> number.
>
>It's not invalid, is it ? Just a repeat. Should it reset one of the RMPP
>timers too ?

If segment 4 had the last bit set, segment 5 is invalid.  The RMPP code
should drop this.

>It also fills in 0 in RRespTime. Should it fill in something to
>correspond to the hard coded time it uses ? Or perhaps 32 (0x1F) ?

I don't think the value of RRespTime matters at this point.

>I will try to get back to gathering more info on this.

Having some more info would help, but I can also try modifying grmpp to see
if I can reproduce this.  My intention is to focus on finding a fix for the
MAD problems at the moment, however, so I'll queue this up to look at it
when I get back to RMPP.

- Sean





More information about the general mailing list