[openib-general] RMPP Message Format Errors
Hal Rosenstock
halr at voltaire.com
Thu Sep 15 04:33:25 PDT 2005
On Thu, 2005-09-15 at 04:01, Eitan Zahavi wrote:
> OK I got the kernel restarted now. From the Analyzer dump I can see the intermediate segments
> paylen is 0 so I guess I'm up to date.
Good.
> But the osmtest produces an inventory file that misses
> some of the records being sent.
>
> Now lets go back to the test:
>
> I use a machine connected through a single switch (IS3) to itself.
>
> I use osmtest -f c to get Nodes, Ports and PathRecords from the SM.
>
> From OpenSM Log file I see:
> Sep 15 09:47:37 531029 [8003] -> osm_nr_rcv_process: Returning 3 records.
> Sep 15 09:47:37 538586 [C004] -> osm_pir_rcv_process: Returning 27 records.
>
> So we can conclude the following RMPP transactions should be sent:
> 1. NodeRec:
> attrOffset is 14 and each record size with padding is 112bytes.
> The RMPP with 336byte data should require 2 segments = ceiling(336/200).
> First segment paylen should be 336 + 2 * 20 = 376.
> Last segment paylen should be 336 - 200 + 20 = 156.
>
> 2. PortInfoRecords:
> attrOffset is 8 and each record size with padding is 64bytes.
> The RMPP with 1728 = 27 * 64byte data should require 9 segments = ceiling(1728/200).
> First segment paylen should be 1728 + 9 * 20 = 1908.
> Lat segment paylen should be 1728 - 8*200 + 20 = 148.
Yes, those calculations appear correct to me.
> What we see in the attached analyzer capture:
> NodeInfoRec
> Attr Expected Measured
> Num Segments 2 2
> First Paylen 376 376
> Last Paylen 156 156
>
> PortInfoRec
> Attr Expected Measured
> Num Segments 9 9
> First Paylen 1908 1908
> Last Paylen 148 148
>
> So the response on the wire is 100% OK. Thanks Sean.
BTW, I did some work here to get this right too.
> Now I go to the SA client section:
>
> From osmtest log I see:
>
> NodeInfoRec:
> Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async event.
> Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x807b8a4, size = 256.
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD 0x807c198, size = 256.
> Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ]
> Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = 0x807b898, p_mad = 0x807c1d0, size = 256.
> Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ]
> Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [
> Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88)
> Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [
> Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ]
> Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ]
> I wonder how come the received MAD is only of 256 bytes. I expected it to be of headers + data = 56 + 336 = 392byte.
>
> So my conclusion is that for some reason the response MAD is not re-assembled correctly or the communication between the
> assembly to the umad layer is broken.
I believe there is something wrong in osm_vendor_ibumad_sa.c in terms of
this. I will look into it. Note that the RMPP part of this had little
testing and the only consumer right now is osmtest which is just
emerging in terms of OpenIB.
> Or maybe I am missing some patches.
No.
> I see that in the osm_vendor_ibumad.c the receive flow is allocating a MAD using:
> p_osm_madw = osm_mad_pool_get_wrapper(p_mad_bind_info->p_mad_pool,
> p_mad_bind_info,
> MAD_BLOCK_SIZE,
> (ib_mad_t*)&pRecvMad->IBMad,
> &osm_mad_addr);
>
> I suspect the allocation should use the receive mad size.
I don't see that call in osm_vendor_ibumad.c; only in osm_vendor_al.c,
osm_vendor_mtl.c, osm_vendor_ts.c, and osm_vendor_umadt.c.
I think there is a problem on the receive side of osm_vendor_ibumad_sa.c
for RMPP. I am looking into it now.
-- Hal
More information about the general
mailing list