[openib-general] [PATCH] [MAD] changes to ib_create_send_mad

Sean Hefty mshefty at ichips.intel.com
Thu May 5 10:10:17 PDT 2005


Hal Rosenstock wrote:
> On PayloadLength, single segment sends are fine. It is multisegment
> sends which seem wrong to me.
> 
> Case 1: ib_create_send_mad with hdr_len 0x38 data_len 0x278
>         paylen_newwin stored in header is 0x28C which seems correct
>         this creates 4 segments
>         1-3 segments paylen_newwin is 0x6E0, 4th segment is 0x34

Converting to decimal so I can use my fingers... hdr_len is 56 and data_len 
is 632.  There are 256-56 = 200 bytes available per segment for the data, 
giving 4 segments.  The paylen_newwin should be set to x28C when sending, so 
that seems correct.

The first segment should have a paylen_newwin value of 220 x 4 = 880 (x370) 
on the wire.  The last segment should have a value of 52 (x34).  If you're 
seeing x6E0 (1760), that's exactly twice the expected payload length value.

I will see if I can reproduce this using the numbers that you gave.  What 
size did you specify for the sge?  The total number of segments is 
calculated using the sge sizes.

> Case 2: ib_create_send_mad with hdr_len 0x38 data_len 0x620
>         paylen_newwin stored in header is 0x634 which seems correct
>         this creates 8 segments
>         1-7 segments paylen_newwin is 0x6E0, 4th segment is 0xBC
> 
> Last paylen_newwin in both cases appears to me to be correct but the 
> paylen_newwin in the 1-n segments (0x6E0) seems wrong to me.

The paylen_newwin is only set by the code when sending the first and last 
segments.  I would have expected the middle segments to carry the same value 
as the first, and I see this behavior on my tests.

The fact that you see the same payload length for both of these makes me 
think that either the sge size is off, or there's an issue in the RMPP code 
using it.

> Also, I did more investigation of send failures. In terms of send
> failures upon not receiving ACKs, SA is different from vendor class 2. I
> think the problem starts (and hopefully ends) with response_mad(). In
> the case of SA GetTableResp being sent, the non data packets (ACK, etc.)
> come back as SA GetTable so this is not currently considered a response
> but I think it needs to be. I'm not sure if there are other issues
> behind this as I didn't chase it further. Let me know if you want me to
> do this.

The RMPP code formats ACKs by inverting the response bit.  This is necessary 
in order to route the ACK to the correct agent, and meets the SA 
requirements.  Response MADs are routed to the correct agent using the TID. 
  Non-response MADs are routed by using the lookup tables.

E.g. if you look at the SA, the ACK (GetTable) needs to go to the agent 
using the lookup tables.  The TID carried in the ACK was actually set by the 
sender of the ACK, and cannot be used to route to the SA's agent.

Can you describe in more detail what the exact failure that you're seeing is 
and what you were expecting?

Thanks for looking into this more.

- Sean



More information about the general mailing list