[ofa-general] Re: CMA can't establish connection with QoS on

Sean Hefty sean.hefty at intel.com
Tue Jan 8 16:26:59 PST 2008


>I updated the bug with the step-by-step instructions how to burn
>the FW and reproduce the error.
>I compiled this "how-to" today, so everything there is up to date.

Thanks - I don't think that I was programming my FW correctly.  I still have
problems running opensm with qos enabled on one of my systems, but I can get it
to work running on the other system.

Anyway, I was able to reproduce the problem, and I believe I understand part of
the problem.  The send for the CM REQ MAD never completes.  A completion never
shows up on the GSI's CQ with a wr_id that matches the send wr_id.  (I don't see
a completion at all.)  This results in a reference being held on the ib_cm id
that is never released, which causes the hang.  (Destruction of the ib_cm id
hangs, which blocks the destruction of the rdma_cm_id, which blocks the close
from userspace.)

If the ib_cm is modified to use SL 0 for the CM MADs, but the connection still
uses SL 1, then ucmatose is able to connect and transfer data between the client
and server.

- Sean




More information about the general mailing list