[ofw] coming and going MAD, MAD, MAD
Sean Hefty
sean.hefty at intel.com
Thu May 7 11:13:31 PDT 2009
I'm trying to test the ibping ib-diag utility. This sends a vendor defined MAD
to a listening ibping server that sends a response. It's hitting a few problems
in the MAD code.
- When a vendor MAD is received, it is automatically forwarded to the
HCA as a 'sent' MAD. The MAD layer allocates a response MAD that
it gives to the HCA driver to fill in. However, the HCA driver
doesn't do anything with the 'sent' MAD or the response and simply
returns success. At this point, I can't quite determine what happens
to the 'sent' MAD.
Figuring out the code path requires a debugger, but appears to be:
spl_qp_comp -> process_recv_mad -> recv_local_mad ->
ib_send_mad -> mad_disp_resume_send -> spl_qp_svc_send ->
local_mad_send -> send_local_mad_cb ->
fwd_local_mad -> (calls get_resp_mad) -> al_local_mad
Does anyone have any idea what eventually becomes of the received MAD?
Ideally, one would like the MAD layer to try to dispatch the MAD,
but that doesn't appear all that easy to accomplish.
- If I run ibping in loopback mode, the sent MAD is forwarded to the
HCA along the same code path as the previous case starting at
fwd_local_mad. Since the send never reaches the receiver,
it eventually times out and the send completes in error.
Similar to before, it would be ideal to dispatch this MAD as a local
receive. (If the ibping server is on a remote system, it does get
sent. It's only loopback that has this issue.) It's not clear to me
where in the code is the best place to dispatch the MAD, or if it
can be done for a sent loopback MAD. I think we need to dispatch a
copy of the MAD, but I'm not sure.
- Note that in both cases above, a response MAD was allocated. After
the HCA returns from processing the forwarded MADs, an attempt is
made to dispatch the response MAD (calls mad_disp_recv_done).
But the response MAD is uninitialized (all zeroes), and the
dispatching fails. The response MAD is simply leaked at this point.
I'm trying to put in changes to avoid leaking the MAD, but this only
fixes part of the issue. The other issue is that the MADs aren't
being dispatched properly. I also can't tell in the first case if
the received MAD that was processed locally was leaked or not.
- Sean
More information about the ofw
mailing list