Thanks for the quick reply. Comments below.<br><br><div><span class="gmail_quote">On 7/18/07, <b class="gmail_sendername">Roland Dreier</b> <<a href="mailto:rdreier@cisco.com">rdreier@cisco.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> Our driver (as do all drivers I've seen) performs an<br> > atomic_dec(&qp->refcount) and wait_event(&qp->refcount) in 'destroy_qp()'.<br> > Because the rdma_cm device hasn't been closed (
i.e., ucma_close() hasn't yet<br> > been called), a cm_id still has an active reference to the qp, and the<br> > wait_event() will end up 'wait'ing.<br><br>In the other drivers I know well (basically mthca and mlx4, since I
<br>wrote them), the qp->refcount being waited for is an internal driver<br>refcount, and is used to make sure that the destroy QP operation waits<br>until any active interrupt handlers are done with the QP. So I think
<br>the problem is that you are letting a cm_id bump the QP's reference<br>count somehow.</blockquote><div><br>I guess this really is relevant only for IWarp. Other IWarp drivers I've seen do an atomic_inc(&qp->refcount) in <device>::qp_add_ref().
<br>Called via cm_id->device->iwcm->add_ref()?. [For example see: iwcm.c::iw_cm_connect()].<br>This reference is removed by a call to cm_id->device->iwcm->rem_ref() [For example see: iwcm::destroy_cm_id()].
<br><br>And, to avoid a deadlock, I still believe that this must happen _before_ ib_uverbs_close() [ and ultimately ib_destroy_qp()] is called.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> Perhaps my device driver should do additional work in ib_destroy_qp() that<br> > will trigger the destruction of the cm_id... [but that doesn't seem<br> > consistent with other drivers I've seen.]<br><br>
That doesn't make sense. I think it's OK if upper layers are left<br>with a stale pointer to your QP -- let them worry about it. Maybe<br>it's an iWARP thing that I don't really understand (I'm much more
<br>familiar with the IB driver interface) but I don't think that the<br>cxgb3 driver runs into this issue.<br><br> > Perhaps the application (i.e., librdmacm) should make sure the "rdma_cm"<br> > device is closed before calling ibv_device_close()?
<br><br>No, because then some other (possibly malicious) app could still cause<br>the deadlock and potentially create a bunch of unkillable processes.</blockquote><div><br>Very true...good point.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
- R.<br></blockquote></div><br>