[ofa-general] rping / librdmacm deadlock question
Steve Wise
swise at opengridcomputing.com
Wed Jul 18 12:03:20 PDT 2007
Sean Hefty wrote:
>> I have, I believe, traced this to a deadlock between ib_destroy_qp()
>> and ucma_close(). It looks like librdmacm has a ((destructor))
>> function defined that results in a call to ibv_device_close() and
>> ultimately in <device>::destroy_qp(). That seems reasonable, and it
>> all happens as the OS unloads the application.
>> However, it is (I believe) happening before the "rdma_cm" device file
>> descriptor is 'closed' by the OS as the application terminates.
>> [rdma_destroy_event_channel() would normally do this, but it doesn't
>> get called when the application is interrupted by SIGINT.]
>
> This seems like an iWarp specific issue caused by the following code in
> iw_cm_connect():
>
> /* Get the ib_qp given the QPN */
> qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
> if (!qp) {
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> return -EINVAL;
> }
> cm_id->device->iwcm->add_ref(qp);
>
> I think the reference is normally removed in cm_close_handler:
>
> if (cm_id_priv->qp) {
> cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp);
> cm_id_priv->qp = NULL;
> }
>
>
> The upstream iWarp drivers must already be able to handle this
> situation, or I'm sure we would have seen the problem before. I'm just
> not familiar enough with the iWarp drivers to see what they do to handle
> it. I'll continue reading through the code, but maybe Steve can
> explain how to avoid the problem.
>
> I wonder if it would be better if the iWarp CM acquired/released the QP
> reference on a per call basis, rather than holding a reference
> throughout the entire connection.
>
The design assume the iwcm can hold this reference and cache the qp ptr.
In the iwarp design, the cm_id (connection) and qp are tighly bound
once the connection is transitioned into rdma mode. This is different
than infiniband.
I still don't see the deadlock?
Steve.
More information about the general
mailing list