[openib-general] Re: RMPP Message Format Errors

Eitan Zahavi eitan at mellanox.co.il
Mon Aug 22 14:09:01 PDT 2005


Sean Hefty wrote:

> 
> The RMPP code returns the size of the receive as sizeof MAD header + sizeof RMPP
> header + optional sizeof other header (e.g. SA header) + actual payload.  This
> size can be used to allocate a data buffer large enough to hold the reassembled
> MAD.  You should be able to use this to determine the number of records in the
> payload.
Good. But how is that size delivered? I mean through umad to the client.

 From my first email on this thread you can see there is at least one 
bug in the chain of events:
a. First segment paylen should be either 0 or correct value - it is 	
    neither. Should be 264 but is 440
b. Last segment paylen MUST be updated to reflect the size of the data
    in the MAD (including class header) - should be 24 but is 100.
c. In the receiver the re-assembled data size is not correct. OpenSM
    reports it got a 200 bytes MAD back. Probably a bug in the vendor
    layer or umad.

Here is the full data again.



1.      NodeRecord MAD size is 112bytes (note the required padding of 4
bytes at the end of the NodeRec data).
2.      OpenSM log file shows the query should return 2 records one for
each end-port. This really happens:


	Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr:
Looking for NodeRecord with LID: 0x0 GUID:0x0000000000000000

	Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c902000017a0

	                                port 0x0002c902000017a1, lid
0x1.

	Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c902000017a0

	                                port 0x0002c902000017a2, lid
0x2.

	Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process:
Returning 2 records.

3.      On the wire we see the following (see attached gif for more
details):
a.      Two data segments were sent and two ACKs were returned. This is
OK.
b.      The first segment reports PayLen = 440bytes. According to the
spec the first segment might provide paylen != 0 and when it is done it
should be equal to the (class header * Num-Segments) + data length. In
our case we have data length = 2*112, and SA extra header = 20byte *
2seg. This leads to peylen=264 and not 440!!!
The spec defines that in p775-l37.
So this is a violation of the spec.
c.      The last segment (segment 2) provides the paylen field of 100.
The expected value for the last segment length should have been: SA
extra header + leftover data size from prev segments. Since the first
segment has 200bytes for data the left over should have been 112*2 - 200
= 24. With the SA extra header 44bytes.
So this is another violation of the spec.
d.      The analyzer is confused by the above and reports the result as
having 3 NodeRecords.
e.      <<Gen2 NodeRec GetTable RMPP Format Error.GIF>>
4.      Following that when we trace the log file of osmtest we find
more issues. Probably caused by changes to the vendor layer or the rmpp
assembly: It is expected that after assembly the size of the RMPP mad
reported to the osm vendor layer will be the rmpp header + SA extra
header + data-size. In our case that is 32 + 20 + 2*112 = 276.

	The log file shows:

	Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 =
200 / 112 (88)

	Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs:
Received 1 records



More information about the general mailing list