[ofw] [PATCH 1/9] dapl-2.0: scm: remove modify QP to ERR state during disconnect on UD type QP

Davis, Arlin R arlin.r.davis at intel.com
Wed May 19 11:23:08 PDT 2010


Patch set of bug fixes as a result of scale-out testing on 128 nodes/1538 cores.

1/9 scm: remove modify QP to ERR state during disconnect on UD type QP
2/9 ucm: increase default UCM retry count for connect reply to 15
3/9 cma, ucm: cleanup issues with dat_ep_free on a connected EP without disconnecting.
4/9 ucm: UD mode, active side cm object released to soon, the RTU could be lost.
5/9 scm: SOCKOPT ERR Connection timed out on large clusters
6/9 scm: cr_thread occasionally segv's when disconnecting all-to-all MPI static connections
7/9 scm: add option to use other network devices with environment variable DAPL_SCM_NETDEV
8/9 scm, cma: fini code can be called multiple times and hang via fork
9/9 scm: check for hca object before signaling thread

The disconnect on a UD type QP should not modify QP to error
since this is a shared QP. The disconnect should be treated
as a NOP on the UD type QP and only be transitioned during
the QP destroy (dat_ep_free).

Signed-off-by: Arlin Davis <arlin.r.davis at intel.com>
---
 dapl/openib_scm/cm.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c
index afd0d93..7465190 100644
--- a/dapl/openib_scm/cm.c
+++ b/dapl/openib_scm/cm.c
@@ -458,13 +458,13 @@ DAT_RETURN dapli_socket_disconnect(dp_ib_cm_handle_t cm_ptr)
 	dapl_os_unlock(&cm_ptr->lock);
 	
 	/* send disc date, close socket, schedule destroy */
-	dapl_os_lock(&cm_ptr->ep->header.lock);
-	dapls_modify_qp_state(cm_ptr->ep->qp_handle, IBV_QPS_ERR, 0,0,0);
-	dapl_os_unlock(&cm_ptr->ep->header.lock);
 	send(cm_ptr->socket, (char *)&disc_data, sizeof(disc_data), 0);
 
 	/* disconnect events for RC's only */
 	if (cm_ptr->ep->param.ep_attr.service_type == DAT_SERVICE_TYPE_RC) {
+		dapl_os_lock(&cm_ptr->ep->header.lock);
+		dapls_modify_qp_state(cm_ptr->ep->qp_handle, IBV_QPS_ERR, 0,0,0);
+		dapl_os_unlock(&cm_ptr->ep->header.lock);
 		if (cm_ptr->ep->cr_ptr) {
 			dapls_cr_callback(cm_ptr,
 					  IB_CME_DISCONNECTED,
-- 
1.5.2.5




More information about the ofw mailing list