[ofa-general] rping / librdmacm deadlock question

Sean Hefty mshefty at ichips.intel.com
Wed Jul 18 11:49:30 PDT 2007


> I have, I believe, traced this to a deadlock between ib_destroy_qp() and 
> ucma_close(). It looks like librdmacm has a ((destructor)) function 
> defined that results in a call to ibv_device_close() and ultimately in 
> <device>::destroy_qp().   That seems reasonable, and it all happens as 
> the OS unloads the application. 
> 
> However, it is (I believe) happening before the "rdma_cm" device file 
> descriptor is 'closed' by the OS as the application terminates.  
> [rdma_destroy_event_channel() would normally do this, but it doesn't get 
> called when the application is interrupted by SIGINT.]

This seems like an iWarp specific issue caused by the following code in 
iw_cm_connect():

	/* Get the ib_qp given the QPN */
	qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
	if (!qp) {
		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
		return -EINVAL;
	}
	cm_id->device->iwcm->add_ref(qp);

I think the reference is normally removed in cm_close_handler:

	if (cm_id_priv->qp) {
		cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp);
		cm_id_priv->qp = NULL;
	}


The upstream iWarp drivers must already be able to handle this 
situation, or I'm sure we would have seen the problem before.  I'm just 
not familiar enough with the iWarp drivers to see what they do to handle 
  it.  I'll continue reading through the code, but maybe Steve can 
explain how to avoid the problem.

I wonder if it would be better if the iWarp CM acquired/released the QP 
reference on a per call basis, rather than holding a reference 
throughout the entire connection.

- Sean



More information about the general mailing list