[ofw] coming and going MAD, MAD, MAD

Sean Hefty sean.hefty at intel.com
Thu May 7 11:13:31 PDT 2009


I'm trying to test the ibping ib-diag utility.  This sends a vendor defined MAD
to a listening ibping server that sends a response.  It's hitting a few problems
in the MAD code.

- When a vendor MAD is received, it is automatically forwarded to the
  HCA as a 'sent' MAD.  The MAD layer allocates a response MAD that
  it gives to the HCA driver to fill in.  However, the HCA driver
  doesn't do anything with the 'sent' MAD or the response and simply
  returns success.  At this point, I can't quite determine what happens
  to the 'sent' MAD.

  Figuring out the code path requires a debugger, but appears to be:

  spl_qp_comp    -> process_recv_mad     -> recv_local_mad  ->
  ib_send_mad    -> mad_disp_resume_send -> spl_qp_svc_send ->
  local_mad_send -> send_local_mad_cb    ->
  fwd_local_mad  -> (calls get_resp_mad) -> al_local_mad 

  Does anyone have any idea what eventually becomes of the received MAD?
  Ideally, one would like the MAD layer to try to dispatch the MAD,
  but that doesn't appear all that easy to accomplish.

- If I run ibping in loopback mode, the sent MAD is forwarded to the
  HCA along the same code path as the previous case starting at
  fwd_local_mad.  Since the send never reaches the receiver,
  it eventually times out and the send completes in error.

  Similar to before, it would be ideal to dispatch this MAD as a local
  receive.  (If the ibping server is on a remote system, it does get
  sent.  It's only loopback that has this issue.)  It's not clear to me
  where in the code is the best place to dispatch the MAD, or if it
  can be done for a sent loopback MAD.  I think we need to dispatch a
  copy of the MAD, but I'm not sure.

- Note that in both cases above, a response MAD was allocated.  After
  the HCA returns from processing the forwarded MADs, an attempt is
  made to dispatch the response MAD (calls mad_disp_recv_done).
  But the response MAD is uninitialized (all zeroes), and the
  dispatching fails.  The response MAD is simply leaked at this point.

  I'm trying to put in changes to avoid leaking the MAD, but this only
  fixes part of the issue.  The other issue is that the MADs aren't
  being dispatched properly.  I also can't tell in the first case if
  the received MAD that was processed locally was leaked or not.

- Sean




More information about the ofw mailing list