[openib-general] RE: RMPP Message Format Errors

Hal Rosenstock halr at voltaire.com
Tue Aug 23 15:54:53 PDT 2005


Hi Eitan,
 
>We have started testing RMPP packets with osmtest and opensm (gen2 version).

>We did not go very far. The first NodeRecord GetTable of all the nodes in a "loopback" case, has some issues.

Is this loopback between the 2 HCA ports ?(Just so I can recreate this when I get back).

> The explanation is below:

> 1.      NodeRecord MAD size is 112bytes (note the required padding of 4 bytes at the end of the NodeRec data). 
> 2.      OpenSM log file shows the query should return 2 records one for each end-port. This really happens: 

		Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr: Looking for NodeRecord with LID: 0x0 GUID:0x0000000000000000

		Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New NodeRecord: node 0x0002c902000017a0

		                                port 0x0002c902000017a1, lid 0x1.

		Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New NodeRecord: node 0x0002c902000017a0

		                                port 0x0002c902000017a2, lid 0x2.

		Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process: Returning 2 records.

> 3.      On the wire we see the following (see attached gif for more details): 

Could you send the raw hex as well ?
a.      Two data segments were sent and two ACKs were returned. This is OK. 
b.      The first segment reports PayLen = 440bytes. According to the spec the first segment might provide paylen != 0 and when it is done it should be equal to the (class header * Num-Segments) + data length. In our case we have data length = 2*112, and SA extra header = 20byte * 2seg. This leads to peylen=264 and not 440!!!
The spec defines that in p775-l37.
So this is a violation of the spec. 

Agreed. It should either be 0 or the real length.

c.      The last segment (segment 2) provides the paylen field of 100. The expected value for the last segment length should have been: SA extra header + leftover data size from prev segments. Since the first segment has 200bytes for data the left over should have been 112*2 - 200 = 24. With the SA extra header 44bytes.
So this is another violation of the spec. 

Yes, but perhaps related to the first issue.

d.      The analyzer is confused by the above and reports the result as having 3 NodeRecords. 
e.      <<Gen2 NodeRec GetTable RMPP Format Error.GIF>> 
4.      Following that when we trace the log file of osmtest we find more issues. Probably caused by changes to the vendor layer or the rmpp assembly: It is expected that after assembly the size of the RMPP mad reported to the osm vendor layer will be the rmpp header + SA extra header + data-size. In our case that is 32 + 20 + 2*112 = 276. 

	The log file shows:

		Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88)

		Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs: Received 1 records

		So this is another problem - probably with the way RMPP results are assembled or pass back to the vendor.

This may be a result of the violations on the sending side. 

> Please let me know if you will have time to dig into these problems or if I should try and resolve them myself and provide patches. 

I will look at these shortly after I get back.

-- Hal




More information about the general mailing list