[openib-general] Re: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device

Wed Sep 7 09:44:54 PDT 2005

Michael S. Tsirkin wrote:
> Could you please elaborate on why is an event driven model more reliable than
> the state driven one?
> It certainly seems to require more code: isnt cm_id->state set by CM to
> a valid value? It seems that cm needs to track connection state
> anyway as per "12.9.2 Invalid State Input Handling", to know which messages
> are legal. Is this done? If so why isnt it a good idea to reuse that state
> in SDP?

The state of the cm_id is controlled by the CM and can change at any time as a 
result of processing a received MAD.  It's only exposed for debug purposes.

>>+	result = ib_cm_listen(hca->listen_id,
>>+		              cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE),
>>+			      cpu_to_be64(SDP_MSG_SERVICE_ID_MASK));
>>+	if (result) {
>>+		sdp_warn("Error <%d> listening for SDP connections", result);
>>+		goto error;
>>+	}
>>+
>> 	ib_set_client_data(device, &sdp_client, hca);
>> 
>> 	return;
> 
> 
> Can a listen event arrive before we call ib_set_client_data?

Yes.  We may need to swap those two statements.

>> error:
>>+	if (!IS_ERR(hca->listen_id))
>>+		ib_destroy_cm_id(hca->listen_id);
>>+
>> 	list_for_each_entry_safe(port, tmp, &hca->port_list, list) {
>> 		list_del(&port->list);
>> 		kfree(port);
>>@@ -1838,6 +1856,9 @@ static void sdp_device_remove_one(struct
>> 		return;
>> 	}
>> 
>>+	if (!IS_ERR(hca->listen_id))
>>+		ib_destroy_cm_id(hca->listen_id);
>>+
>> 	list_for_each_entry_safe(port, tmp, &hca->port_list, list) {
>> 		list_del(&port->list);
>> 		kfree(port);
> 
> 
> I'd prefer separate labels for two errors, instead of testing IS_ERR twice.

So do I, but I was trying to match the coding style used throughout the SDP 
code.  Fixing the error handling seems like another set of changes to me.

>>+	if (event->event != IB_CM_REQ_RECEIVED) {
>>+		conn = sdp_conn_table_lookup(hashent);
>>+		if (conn)
>>+			sdp_conn_lock(conn);
>>+		else
>> 			return -EINVAL;
>>-		}
>>+		/* Can this fail?  Why not just set context = conn? */
>>+	}
> 
> 
> Regarding the question: the first thing we do in sdp_conn_put
> is sdp_conn_table_remove, so I think this lookup can fail.

I should have removed that comment.  I'm not overly familiar with the SDP code, 
but it seems wrong to need to verify that a requested callback can be executed. 
  By returning -EINVAL above, the cm_id associated with the callback will be 
destroyed.  This indicates to me that either that cm_id will be destroyed twice 
by SDP (once when the SDP conn object is destroyed and again here), or that SDP 
does not usually destroy cm_id's as part of its normal cleanup.  I'm also 
wondering about possible race conditions that could occur between calling 
table_lookup and conn_lock, but I don't know the code well enough to say if one 
exists.

>>-	switch (cm_id->state) {
>>-	case IB_CM_REQ_RCVD:
>>+	switch (event->event) {
>>+	case IB_CM_REQ_RECEIVED:
>> 		result = sdp_cm_req_handler(cm_id, event);
>> 		break;
>>-	case IB_CM_REP_RCVD:
>>+	case IB_CM_REP_RECEIVED:
>> 		result = sdp_cm_rep_handler(cm_id, event, conn);
>> 		break;
>>-	case IB_CM_IDLE:
>>+	case IB_CM_REQ_ERROR:
>>+	case IB_CM_REP_ERROR:
>>+	case IB_CM_REJ_RECEIVED:
>>+	case IB_CM_TIMEWAIT_EXIT:
>> 		result = sdp_cm_idle(cm_id, event, conn);
>> 		break;
>>-	case IB_CM_ESTABLISHED:
>>+	case IB_CM_RTU_RECEIVED:
>>+	case IB_CM_USER_ESTABLISHED:
>> 		result = sdp_cm_established(cm_id, event, conn);
>> 		break;
>>-	case IB_CM_DREQ_RCVD:
>>+	case IB_CM_DREQ_RECEIVED:
>> 		result = sdp_cm_dreq_rcvd(cm_id, event, conn);
>> 		if (result)
>> 			break;
>> 		/* fall through on success to handle state transition */
>>-	case IB_CM_TIMEWAIT:
>>+	case IB_CM_DREQ_ERROR:
>>+	case IB_CM_DREP_RECEIVED:
>> 		result = sdp_cm_timewait(cm_id, event, conn);
>> 		break;
>> 	default:
>>-		sdp_dbg_warn(conn, "Unexpected CM state <%d>", cm_id->state);
>>+		sdp_dbg_warn(conn, "Unhandled CM event <%d>", event->event);
>> 		result = -EINVAL;
>> 	}
>> 	/*
> 
> Seems more code. Could you please elaborate on why is this more correct?

See comment above.  Using the cm_id state is incorrect as it can change while a 
client's callback is running.

- Sean