[openib-general] Question about QP's in timewait state and CM stale conn rejects

Or Gerlitz ogerlitz at voltaire.com
Thu Aug 17 01:18:56 PDT 2006


Sean Hefty wrote:
> Even if we pushed timewait handling under verbs, a user could always get a QP 
> that the remote side thinks is connected.  The original connection could fail to 
> disconnect because of lost DREQs.  So, locally, the QP could have exited 
> timewait, while the remote side still thinks that it's connected.

Sean,

If you don't mind (also related to the patch you have sent Eric of 
randomizing the initial local cm id) to get into this deeper, can we do 
here a quick code review of the REQ matching logic? I wrote what i 
understand below.

> static struct cm_id_private * cm_match_req(struct cm_work *work,
> +                                          struct cm_id_private *cm_id_priv)
> +{
> +       struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
> +       struct cm_timewait_info *timewait_info;
> +       struct cm_req_msg *req_msg;
> +       unsigned long flags;
> +
> +       req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
> +
> +       /* Check for duplicate REQ and stale connections. */
> +       spin_lock_irqsave(&cm.lock, flags);
> +       timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
> +       if (!timewait_info)
> +               timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);

This if() holds when <remote_id, remote_ca_guid> entry is present in 
remote_id_table OR <remote_qpn,remote_ca_guid> entry is present in 
remote_qpn_table

> +       if (timewait_info) {
> +               cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
> +                                          timewait_info->work.remote_id);
 > +               spin_unlock_irqrestore(&cm.lock, flags);
> +               if (cur_cm_id_priv) {
> +                       cm_dup_req_handler(work, cur_cm_id_priv);
> +                       cm_deref_id(cur_cm_id_priv);

<local_id, remote_id> entry exists in local_id_table, looking on 
dup_req_handler() i see it sends REP when the id is in "MRA sent" and 
sends a STALE_CONN REJ when the id is in timewait state, else it does 
nothing.

> +               } else
> +                       cm_issue_rej(work->port, work->mad_recv_wc,
> +                                    IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ,
> +                                    NULL, 0);

what is this case? there is no <local_id,remote_id> entry but there is 
remote <id,ca_guid> or <qpn,ca_guid> entries???

> +               goto error;
> +       }

Or.





More information about the general mailing list