[openib-general] RMPP Message Format Errors

Eitan Zahavi eitan at mellanox.co.il
Thu Sep 15 01:01:47 PDT 2005


Hal Rosenstock wrote:
> 
> 
> No. The patches are part of this. It would depend on what OpenIB svn
> version you are running with but if it is a recent pull then they are
> all there.
OK I got the kernel restarted now. From the Analyzer dump I can see the intermediate segments
paylen is 0 so I guess I'm up to date. But the osmtest produces an inventory file that misses
some of the records being sent.

Now lets go back to the test:

I use a machine connected through a single switch (IS3) to itself.

I use osmtest -f c to get Nodes,Ports and PathRecords from the SM.

 From OpenSM Log file I see:
Sep 15 09:47:37 531029 [8003] -> osm_nr_rcv_process: Returning 3 records.
Sep 15 09:47:37 538586 [C004] -> osm_pir_rcv_process: Returning 27 records.

So we can conclude the following RMPP transactions should be sent:
1. NodeRec:
    attrOffset is 14 and each record size with padding is 112bytes.
    The RMPP with 336byte data should require 2 segments = ceiling(336/200).
    First segment paylen should be 336 + 2 * 20 = 376.
    Last segment paylen should be  336 - 200 + 20 = 156.

2. PortInfoRecords:
    attrOffset is 8 and each record size with padding is 64bytes.
    The RMPP with 1728 = 27 * 64byte data should require 9 segments = ceiling(1728/200).
    First segment paylen should be 1728 + 9 * 20 = 1908.
    Lat segment paylen should be  1728 - 8*200 + 20 = 148.

What we see in the attached analyzer capture:
NodeInfoRec
Attr          Expected Measured
Num Segments	2	2
First Paylen	376	376
Last Paylen	156	156

PortInfoRec
Attr          Expected Measured
Num Segments	9	9
First Paylen	1908	1908
Last Paylen	148	148

So the response on the wire is 100% OK. Thanks Sean.

Now I go to the SA client section:

 From osmtest log I see:

NodeInfoRec:
Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async event.
Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [
Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [
Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x807b8a4, size = 256.
Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD 0x807c198, size = 256.
Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ]
Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = 0x807b898, p_mad = 0x807c1d0, size = 256.
Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ]
Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [
Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88)
Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [
Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ]
Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ]
I wonder how come the received MAD is only of 256 bytes. I expected it to be of headers + data = 56 + 336 = 392byte.

So my conclusion is that for some reason the response MAD is not re-assembled correctly or the communication between the
assembly to the umad layer is broken.

Or maybe I am missing some patches.

I see that in the osm_vendor_ibumad.c the receive flow is allocating a MAD using:
     p_osm_madw = osm_mad_pool_get_wrapper(p_mad_bind_info->p_mad_pool,
                                           p_mad_bind_info,
                                           MAD_BLOCK_SIZE,
                                           (ib_mad_t*)&pRecvMad->IBMad,
                                           &osm_mad_addr);

I suspect the allocation should use the receive mad size.

Thanks

Eitan



More information about the general mailing list