[ofa-general] IB/cm: bug in stale connection detection logic?

Michael S. Tsirkin mst at dev.mellanox.co.il
Sun May 20 06:44:41 PDT 2007


Sean, Roland, pls comment on the following 2 questions:

1. I see this in cm_match_req:

        timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
        if (!timewait_info)
                timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);

        if (timewait_info) {
                cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
                                           timewait_info->work.remote_id);
                cm_cleanup_timewait(cm_id_priv->timewait_info);
                spin_unlock_irqrestore(&cm.lock, flags);
                if (cur_cm_id_priv) {
                        cm_dup_req_handler(work, cur_cm_id_priv);
                        cm_deref_id(cur_cm_id_priv);
                } else
                        cm_issue_rej(work->port, work->mad_recv_wc,
                                     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ,
                                     NULL, 0);

Note that cm_get_id is passed data from timewait_info and not from the request:
thus, if the QPN in request matches QPN in an existing connection, this is
mis-detected as a duplicate request even if the IDs do not match;
thus, the request is dropped or "duplicate" reject is sent instead of
a "stale connection" reject.

Am I missing something?

Suggestion:
Why is an extra call to cm_get_id required to detect a duplicate?
Shouldn't we just

        timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
	if (timewait_info) {
		/* handle duplicate */
		return;
	}

	timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
	if (timewait_info) {
		/* handle stale */
		return;
	}

	not a duplicate and not a stale connection

2. Another question:

	cm_dup_req_handler does this:
       /* Quick state check to discard duplicate REQs. */
        if (cm_id_priv->id.state == IB_CM_REQ_RCVD)
                return;

Why is this code here? IB_CM_REQ_RCVD is an ephemeural state,
going to IB_CM_REP_SENT immediately.
Why are duplicate REQs discarded? Should not REP be re-sent?
See 12.9.6 COMMUNICATION ESTABLISHMENT - PASSIVE

The spec also says:
	The general rule for handling any input message that is received while in 
	an ephemeral state is to hold that message pending until the CM protocol 
	enters a non-ephemeral state.

So this code looks wrong.
What am I missing?

-- 
MST



More information about the general mailing list