[ofa-general] rping / librdmacm deadlock question
Steve Wise
swise at opengridcomputing.com
Wed Jul 18 12:34:17 PDT 2007
Steve Wise wrote:
> Sean Hefty wrote:
>>> I have, I believe, traced this to a deadlock between ib_destroy_qp()
>>> and ucma_close(). It looks like librdmacm has a ((destructor))
>>> function defined that results in a call to ibv_device_close() and
>>> ultimately in <device>::destroy_qp(). That seems reasonable, and it
>>> all happens as the OS unloads the application.
>>> However, it is (I believe) happening before the "rdma_cm" device file
>>> descriptor is 'closed' by the OS as the application terminates.
>>> [rdma_destroy_event_channel() would normally do this, but it doesn't
>>> get called when the application is interrupted by SIGINT.]
>>
>> This seems like an iWarp specific issue caused by the following code
>> in iw_cm_connect():
>>
>> /* Get the ib_qp given the QPN */
>> qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
>> if (!qp) {
>> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>> return -EINVAL;
>> }
>> cm_id->device->iwcm->add_ref(qp);
>>
>> I think the reference is normally removed in cm_close_handler:
>>
>> if (cm_id_priv->qp) {
>> cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp);
>> cm_id_priv->qp = NULL;
>> }
>>
>>
>> The upstream iWarp drivers must already be able to handle this
>> situation, or I'm sure we would have seen the problem before. I'm
>> just not familiar enough with the iWarp drivers to see what they do to
>> handle it. I'll continue reading through the code, but maybe Steve
>> can explain how to avoid the problem.
>>
>> I wonder if it would be better if the iWarp CM acquired/released the
>> QP reference on a per call basis, rather than holding a reference
>> throughout the entire connection.
>>
>
> The design assume the iwcm can hold this reference and cache the qp ptr.
> In the iwarp design, the cm_id (connection) and qp are tighly bound
> once the connection is transitioned into rdma mode. This is different
> than infiniband.
>
> I still don't see the deadlock?
>
I've re-read this thread and I think I've posted the answers for Clark...
Steve.
>
> Steve.
>
More information about the general
mailing list