[openib-general] Send Side RMPP and OpenSM GetTableResp

Hal Rosenstock halr at voltaire.com
Tue May 31 04:56:02 PDT 2005


On Tue, 2005-05-31 at 04:27, Sean Hefty wrote: 
> >            <--    SA GetTableResp
> >
> >					RMPP flags 0x05 (Data, Last)
> >					SegmentNumber 4
> >					PayloadLength 0x34
> >					TID 8
> >SA GetTable -->
> >RMPP flags 0x02 (ACK)
> >SegmentNumber 1
> >NewWindowLast 6
> >TID 8
> 
> This segment number is off - not sure why.

It is off in that the 3 segments just sent are not acknowledged but it
is legal to acknowledge what you have already received. This does not
violate anything.

> It could indicate that segment 2 was lost, 

That's one possibility but I doubt it is getting lost.

> or that its processing came after that of a later segment.

This seems more likely. The timing was each packet was slightly less
than 20 usec apart until the SA appeared to resend (the gap there was
almost almost 770 usec).

> The RMPP code always updates its SegmentNumber in an ACK based on the last
> received packet that arrived in order.  This ACK should be dropped by the SA
> side as a duplicate.  The SA would then rely on a timeout to resend.

The gap was 769.912 usec. Not sure whether this corresponds to any IBTA
timeout. Is this the hardcoded timeout that is used ?

> Did you ever see an ACK for segment 4?

No.

>   Regardless what went wrong on the SA
> side, the client needs to be able to deal with it.

This applies in both directions but in this case I think you mean the
other direction (whatever went wrong on the SA client side the SA needs
to be able to deal with it).

> >            <--    SA GetTableResp
> >					RMPP flags 0x01 (Data)
> >					SegmentNumber 5
> >					PayloadLength 0x34
> >					TID 8
> 
> This should not occur.  The maximum segment number sent should have stayed
> at 4.  I guess one area to check is to make sure that the PayloadLength in
> the original MAD is set correctly.  I do not know what would happen if it
> were set incorrectly.  There could also be an error in how RMPP calculates
> the number of segments that will be sent.

It does look like it is trying to resend the last (at least based on the
PayloadLength) ? I will find where to instrument this in the code.

> This segment should have been dropped by the client as an invalid segment
> number.

It's not invalid, is it ? Just a repeat. Should it reset one of the RMPP
timers too ?

> >I presume the reACKing is used as a keep alive so a response timeout
> >(Resp) does not occur. The SA client is using RRespTime of 0xE. The
> >OpenIB side sets this field to 0 (not sure if this affects the SA client
> >side).
> 
> RMPP is using hard-coded timeouts at this point.  It would require path
> record information to calculate one that's more accurate.

It also fills in 0 in RRespTime. Should it fill in something to
correspond to the hard coded time it uses ? Or perhaps 32 (0x1F) ?

> >Some questions on the RMPP sender side (SA):
> >
> >1. I wouldn't think that reACKing the same segment (1) by the receiver
> >(SA client end) would cause the sender side to send segment 5.
> 
> Correct - nothing should cause the SA to send segment 5...
> 
> >2. In the resend, the header (everything up to SA data) appears to be
> >good but the data appears to be garbage.
> 
> This leads me to think that it is an issue with the setting or calculations
> based on the PayloadLength or sge size.  I need to see if the code that
> calculates the total number of segments to send matches up with the code
> that determines if a segment should have the last bit set.
> 
> >Have you tested RMPP retransmission ? Have you tested simultaneous
> >transactions in progress ?
> 
> Both of these have been tested.  I've forced failures at multiple points in
> the retransmissions and tested a few thousands simultaneous transactions,
> and I've never noticed a problem like what you're seeing.  But that could
> very well could be because of limitations to my test program.  I will spend
> a little time tomorrow afternoon looking at this...

I will try to get back to gathering more info on this.

Thanks.

-- Hal




More information about the general mailing list