[openib-general] Re: RMPP Message Format Errors
Eitan Zahavi
eitan at mellanox.co.il
Mon Aug 22 14:09:01 PDT 2005
Sean Hefty wrote:
>
> The RMPP code returns the size of the receive as sizeof MAD header + sizeof RMPP
> header + optional sizeof other header (e.g. SA header) + actual payload. This
> size can be used to allocate a data buffer large enough to hold the reassembled
> MAD. You should be able to use this to determine the number of records in the
> payload.
Good. But how is that size delivered? I mean through umad to the client.
From my first email on this thread you can see there is at least one
bug in the chain of events:
a. First segment paylen should be either 0 or correct value - it is
neither. Should be 264 but is 440
b. Last segment paylen MUST be updated to reflect the size of the data
in the MAD (including class header) - should be 24 but is 100.
c. In the receiver the re-assembled data size is not correct. OpenSM
reports it got a 200 bytes MAD back. Probably a bug in the vendor
layer or umad.
Here is the full data again.
1. NodeRecord MAD size is 112bytes (note the required padding of 4
bytes at the end of the NodeRec data).
2. OpenSM log file shows the query should return 2 records one for
each end-port. This really happens:
Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr:
Looking for NodeRecord with LID: 0x0 GUID:0x0000000000000000
Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c902000017a0
port 0x0002c902000017a1, lid
0x1.
Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c902000017a0
port 0x0002c902000017a2, lid
0x2.
Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process:
Returning 2 records.
3. On the wire we see the following (see attached gif for more
details):
a. Two data segments were sent and two ACKs were returned. This is
OK.
b. The first segment reports PayLen = 440bytes. According to the
spec the first segment might provide paylen != 0 and when it is done it
should be equal to the (class header * Num-Segments) + data length. In
our case we have data length = 2*112, and SA extra header = 20byte *
2seg. This leads to peylen=264 and not 440!!!
The spec defines that in p775-l37.
So this is a violation of the spec.
c. The last segment (segment 2) provides the paylen field of 100.
The expected value for the last segment length should have been: SA
extra header + leftover data size from prev segments. Since the first
segment has 200bytes for data the left over should have been 112*2 - 200
= 24. With the SA extra header 44bytes.
So this is another violation of the spec.
d. The analyzer is confused by the above and reports the result as
having 3 NodeRecords.
e. <<Gen2 NodeRec GetTable RMPP Format Error.GIF>>
4. Following that when we trace the log file of osmtest we find
more issues. Probably caused by changes to the vendor layer or the rmpp
assembly: It is expected that after assembly the size of the RMPP mad
reported to the osm vendor layer will be the rmpp header + SA extra
header + data-size. In our case that is 32 + 20 + 2*112 = 276.
The log file shows:
Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 =
200 / 112 (88)
Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs:
Received 1 records
More information about the general
mailing list