[openib-general] RMPP Message Format Errors

Hal Rosenstock halr at voltaire.com
Thu Sep 15 04:33:25 PDT 2005


On Thu, 2005-09-15 at 04:01, Eitan Zahavi wrote:
> OK I got the kernel restarted now. From the Analyzer dump I can see the intermediate segments
> paylen is 0 so I guess I'm up to date.

Good.

>  But the osmtest produces an inventory file that misses
> some of the records being sent.
> 
> Now lets go back to the test:
> 
> I use a machine connected through a single switch (IS3) to itself.
> 
> I use osmtest -f c to get Nodes, Ports and PathRecords from the SM.
> 
>  From OpenSM Log file I see:
> Sep 15 09:47:37 531029 [8003] -> osm_nr_rcv_process: Returning 3 records.
> Sep 15 09:47:37 538586 [C004] -> osm_pir_rcv_process: Returning 27 records.
> 
> So we can conclude the following RMPP transactions should be sent:
> 1. NodeRec:
>     attrOffset is 14 and each record size with padding is 112bytes.
>     The RMPP with 336byte data should require 2 segments = ceiling(336/200).
>     First segment paylen should be 336 + 2 * 20 = 376.
>     Last segment paylen should be  336 - 200 + 20 = 156.
> 
> 2. PortInfoRecords:
>     attrOffset is 8 and each record size with padding is 64bytes.
>     The RMPP with 1728 = 27 * 64byte data should require 9 segments = ceiling(1728/200).
>     First segment paylen should be 1728 + 9 * 20 = 1908.
>     Lat segment paylen should be  1728 - 8*200 + 20 = 148.

Yes, those calculations appear correct to me.

> What we see in the attached analyzer capture:
> NodeInfoRec
> Attr          Expected Measured
> Num Segments	2	2
> First Paylen	376	376
> Last Paylen	156	156
> 
> PortInfoRec
> Attr          Expected Measured
> Num Segments	9	9
> First Paylen	1908	1908
> Last Paylen	148	148
> 
> So the response on the wire is 100% OK. Thanks Sean.

BTW, I did some work here to get this right too.

> Now I go to the SA client section:
> 
>  From osmtest log I see:
> 
> NodeInfoRec:
> Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async event.
> Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x807b8a4, size = 256.
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD 0x807c198, size = 256.
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ]
> Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = 0x807b898, p_mad = 0x807c1d0, size = 256.
> Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ]
> Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [
> Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88)
> Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [
> Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ]
> Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ]
> I wonder how come the received MAD is only of 256 bytes. I expected it to be of headers + data = 56 + 336 = 392byte.
> 
> So my conclusion is that for some reason the response MAD is not re-assembled correctly or the communication between the
> assembly to the umad layer is broken.

I believe there is something wrong in osm_vendor_ibumad_sa.c in terms of
this. I will look into it. Note that the RMPP part of this had little
testing and the only consumer right now is osmtest which is just
emerging in terms of OpenIB.

> Or maybe I am missing some patches.

No.

> I see that in the osm_vendor_ibumad.c the receive flow is allocating a MAD using:

>      p_osm_madw = osm_mad_pool_get_wrapper(p_mad_bind_info->p_mad_pool,
>                                            p_mad_bind_info,
>                                            MAD_BLOCK_SIZE,
>                                            (ib_mad_t*)&pRecvMad->IBMad,
>                                            &osm_mad_addr);
> 
> I suspect the allocation should use the receive mad size.

I don't see that call in osm_vendor_ibumad.c; only in osm_vendor_al.c,
osm_vendor_mtl.c, osm_vendor_ts.c, and osm_vendor_umadt.c.

I think there is a problem on the receive side of osm_vendor_ibumad_sa.c
for RMPP. I am looking into it now.

-- Hal




More information about the general mailing list