[openib-general] RMPP Message Format Errors
Eitan Zahavi
eitan at mellanox.co.il
Sun Aug 21 05:48:16 PDT 2005
Hi Sean, Hal,
We have started testing RMPP packets with osmtest and opensm (gen2 version).
We did not go very far. The first NodeRecord GetTable of all the nodes in a
"loopback" case, has some issues.
The explanation is below:
1. NodeRecord MAD size is 112bytes (note the required padding of 4
bytes at the end of the NodeRec data).
2. OpenSM log file shows the query should return 2 records one for each
end-port. This really happens:
Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr:
Looking for NodeRecord with LID: 0x0 GUID:0x0000000000000000
Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr:
New NodeRecord: node 0x0002c902000017a0
port 0x0002c902000017a1, lid
0x1.
Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr:
New NodeRecord: node 0x0002c902000017a0
port 0x0002c902000017a2, lid
0x2.
Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process:
Returning 2 records.
3. On the wire we see the following (see attached gif for more
details):
a. Two data segments were sent and two ACKs were returned. This is OK.
b. The first segment reports PayLen = 440bytes. According to the spec
the first segment might provide paylen != 0 and when it is done it should be
equal to the (class header * Num-Segments) + data length. In our case we
have data length = 2*112, and SA extra header = 20byte * 2seg. This leads to
peylen=264 and not 440!!!
The spec defines that in p775-l37.
So this is a violation of the spec.
c. The last segment (segment 2) provides the paylen field of 100. The
expected value for the last segment length should have been: SA extra header
+ leftover data size from prev segments. Since the first segment has
200bytes for data the left over should have been 112*2 - 200 = 24. With the
SA extra header 44bytes.
So this is another violation of the spec.
d. The analyzer is confused by the above and reports the result as
having 3 NodeRecords.
e. <<Gen2 NodeRec GetTable RMPP Format Error.GIF>>
4. Following that when we trace the log file of osmtest we find more
issues. Probably caused by changes to the vendor layer or the rmpp assembly:
It is expected that after assembly the size of the RMPP mad reported to the
osm vendor layer will be the rmpp header + SA extra header + data-size. In
our case that is 32 + 20 + 2*112 = 276.
The log file shows:
Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count =
1 = 200 / 112 (88)
Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs:
Received 1 records
So this is another problem - probably with the way RMPP
results are assembled or pass back to the vendor.
Please let me know if you will have time to dig into these problems or if I
should try and resolve them myself and provide patches.
Thanks
Eitan
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050821/ef68e99c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Gen2 NodeRec GetTable RMPP Format Error.GIF
Type: image/gif
Size: 49481 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050821/ef68e99c/attachment.gif>
More information about the general
mailing list