[openib-general] Question about QP's in timewait state and CM stale conn rejects
Or Gerlitz
ogerlitz at voltaire.com
Thu Aug 17 01:18:56 PDT 2006
Sean Hefty wrote:
> Even if we pushed timewait handling under verbs, a user could always get a QP
> that the remote side thinks is connected. The original connection could fail to
> disconnect because of lost DREQs. So, locally, the QP could have exited
> timewait, while the remote side still thinks that it's connected.
Sean,
If you don't mind (also related to the patch you have sent Eric of
randomizing the initial local cm id) to get into this deeper, can we do
here a quick code review of the REQ matching logic? I wrote what i
understand below.
> static struct cm_id_private * cm_match_req(struct cm_work *work,
> + struct cm_id_private *cm_id_priv)
> +{
> + struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
> + struct cm_timewait_info *timewait_info;
> + struct cm_req_msg *req_msg;
> + unsigned long flags;
> +
> + req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
> +
> + /* Check for duplicate REQ and stale connections. */
> + spin_lock_irqsave(&cm.lock, flags);
> + timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
> + if (!timewait_info)
> + timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
This if() holds when <remote_id, remote_ca_guid> entry is present in
remote_id_table OR <remote_qpn,remote_ca_guid> entry is present in
remote_qpn_table
> + if (timewait_info) {
> + cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
> + timewait_info->work.remote_id);
> + spin_unlock_irqrestore(&cm.lock, flags);
> + if (cur_cm_id_priv) {
> + cm_dup_req_handler(work, cur_cm_id_priv);
> + cm_deref_id(cur_cm_id_priv);
<local_id, remote_id> entry exists in local_id_table, looking on
dup_req_handler() i see it sends REP when the id is in "MRA sent" and
sends a STALE_CONN REJ when the id is in timewait state, else it does
nothing.
> + } else
> + cm_issue_rej(work->port, work->mad_recv_wc,
> + IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ,
> + NULL, 0);
what is this case? there is no <local_id,remote_id> entry but there is
remote <id,ca_guid> or <qpn,ca_guid> entries???
> + goto error;
> + }
Or.
More information about the general
mailing list