[openib-general] Question about QP's in timewait state and CM stale conn rejects
Or Gerlitz
ogerlitz at voltaire.com
Sun Aug 20 04:54:14 PDT 2006
This email appear in the archive, but seems not to be distributed to the
subscribers so i am reposting it.
Or Gerlitz wrote:
> Sean Hefty wrote:
>> Even if we pushed timewait handling under verbs, a user could always
>> get a QP that the remote side thinks is connected. The original
>> connection could fail to disconnect because of lost DREQs. So,
>> locally, the QP could have exited timewait, while the remote side
>> still thinks that it's connected.
>
> Sean,
>
> If you don't mind (also related to the patch you have sent Eric of
> randomizing the initial local cm id) to get into this deeper, can we do
> here a quick code review of the REQ matching logic? I wrote what i
> understand below.
>
>> static struct cm_id_private * cm_match_req(struct cm_work *work,
>> + struct cm_id_private
>> *cm_id_priv)
>> +{
>> + struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
>> + struct cm_timewait_info *timewait_info;
>> + struct cm_req_msg *req_msg;
>> + unsigned long flags;
>> +
>> + req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
>> +
>> + /* Check for duplicate REQ and stale connections. */
>> + spin_lock_irqsave(&cm.lock, flags);
>> + timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
>> + if (!timewait_info)
>> + timewait_info =
>> cm_insert_remote_qpn(cm_id_priv->timewait_info);
>
> This if() holds when <remote_id, remote_ca_guid> entry is present in
> remote_id_table OR <remote_qpn,remote_ca_guid> entry is present in
> remote_qpn_table
>
>> + if (timewait_info) {
>> + cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
>> +
>> timewait_info->work.remote_id);
> > + spin_unlock_irqrestore(&cm.lock, flags);
>> + if (cur_cm_id_priv) {
>> + cm_dup_req_handler(work, cur_cm_id_priv);
>> + cm_deref_id(cur_cm_id_priv);
>
> <local_id, remote_id> entry exists in local_id_table, looking on
> dup_req_handler() i see it sends REP when the id is in "MRA sent" and
> sends a STALE_CONN REJ when the id is in timewait state, else it does
> nothing.
>
>> + } else
>> + cm_issue_rej(work->port, work->mad_recv_wc,
>> + IB_CM_REJ_STALE_CONN,
>> CM_MSG_RESPONSE_REQ,
>> + NULL, 0);
>
> what is this case? there is no <local_id,remote_id> entry but there is
> remote <id,ca_guid> or <qpn,ca_guid> entries???
>
>> + goto error;
>> + }
>
> Or.
>
More information about the general
mailing list