[openib-general] rdma_cm callback event private data length == 0

Ira Weiny weiny2 at llnl.gov
Mon Jul 10 14:44:29 PDT 2006


We have run into a bug here using Lustre which uses the rdma_cm interface.
When nodes crash and come back up they try to connect to the "servers" and the
"servers" were sending back an IB_CM_REJ_CONSUMER_DEFINED message with a private
data structure of length 7.

However, the client would see a length of 0 for the private data.  At first
Eric and I thought that the core was sending the REJ message without private
data.  However I have found that the message is from the Lustre ULP and does
in fact have the 7 bytes of data in it.

The "problem" with the private_data_len being 0 appears to be in the
cma_ib_handler function.  The following is a patch which simply tells the user
the private data length for the REJ message.  Lustre, which checks this length,
then happily gets its data.  Is this a bug which needs to be fixed for all the
CM messages?  Or is it incorrect to look at this length to determine if there
is private data included in the message?  Since the ib_cm_event structure does
not have a length for this data I don't know how else to set this value?

Thanks in advance,
Ira Weiny
LLNL


Index: openib/infiniband/core/cma.c
===================================================================
--- openib/infiniband/core/cma.c	(revision 2508)
+++ openib/infiniband/core/cma.c	(working copy)
@@ -814,6 +826,7 @@
 		cma_modify_qp_err(&id_priv->id);
 		status = ib_event->param.rej_rcvd.reason;
 		event = RDMA_CM_EVENT_REJECTED;
+		private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE;
 		break;
 	default:
 		printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d",
@@ -821,6 +834,11 @@
 		goto out;
 	}
 
+	if (ib_event->event == IB_CM_REJ_RECEIVED)
+	{
+		printk(KERN_CRIT "REJECT (private_data_len = %d)\n",
+			private_data_len);
+	}
 	ret = cma_notify_user(id_priv, event, status, ib_event->private_data,
 			      private_data_len);
 	if (ret) {





More information about the general mailing list