[openib-general] Send Side RMPP and OpenSM GetTableResp
Sean Hefty
sean.hefty at intel.com
Tue May 31 01:27:57 PDT 2005
> <---------------------------- SA GetTableResp
>
> RMPP flags 0x05 (Data, Last)
> SegmentNumber 4
> PayloadLength 0x34
> TID 8
>SA GetTable ---------------------------->
>RMPP flags 0x02 (ACK)
>SegmentNumber 1
>NewWindowLast 6
>TID 8
This segment number is off - not sure why. It could indicate that segment 2
was lost, or that its processing came after that of a later segment. The
RMPP code always updates its SegmentNumber in an ACK based on the last
received packet that arrived in order. This ACK should be dropped by the SA
side as a duplicate. The SA would then rely on a timeout to resend.
Did you ever see an ACK for segment 4? Regardless what went wrong on the SA
side, the client needs to be able to deal with it.
> <---------------------------- SA GetTableResp
> RMPP flags 0x01 (Data)
> SegmentNumber 5
> PayloadLength 0x34
> TID 8
This should not occur. The maximum segment number sent should have stayed
at 4. I guess one area to check is to make sure that the PayloadLength in
the original MAD is set correctly. I do not know what would happen if it
were set incorrectly. There could also be an error in how RMPP calculates
the number of segments that will be sent.
This segment should have been dropped by the client as an invalid segment
number.
>I presume the reACKing is used as a keep alive so a response timeout
>(Resp) does not occur. The SA client is using RRespTime of 0xE. The
>OpenIB side sets this field to 0 (not sure if this affects the SA client
>side).
RMPP is using hard-coded timeouts at this point. It would require path
record information to calculate one that's more accurate.
>Some questions on the RMPP sender side (SA):
>
>1. I wouldn't think that reACKing the same segment (1) by the receiver
>(SA client end) would cause the sender side to send segment 5.
Correct - nothing should cause the SA to send segment 5...
>2. In the resend, the header (everything up to SA data) appears to be
>good but the data appears to be garbage.
This leads me to think that it is an issue with the setting or calculations
based on the PayloadLength or sge size. I need to see if the code that
calculates the total number of segments to send matches up with the code
that determines if a segment should have the last bit set.
>Have you tested RMPP retransmission ? Have you tested simultaneous
>transactions in progress ?
Both of these have been tested. I've forced failures at multiple points in
the retransmissions and tested a few thousands simultaneous transactions,
and I've never noticed a problem like what you're seeing. But that could
very well could be because of limitations to my test program. I will spend
a little time tomorrow afternoon looking at this...
- Sean
More information about the general
mailing list