[openib-general] Question about QP's in timewait state and CM stale conn rejects

Or Gerlitz ogerlitz at voltaire.com
Sun Aug 20 04:54:14 PDT 2006


This email appear in the archive, but seems not to be distributed to the 
subscribers so i am reposting it.


Or Gerlitz wrote:
> Sean Hefty wrote:
>> Even if we pushed timewait handling under verbs, a user could always 
>> get a QP that the remote side thinks is connected.  The original 
>> connection could fail to disconnect because of lost DREQs.  So, 
>> locally, the QP could have exited timewait, while the remote side 
>> still thinks that it's connected.
> 
> Sean,
> 
> If you don't mind (also related to the patch you have sent Eric of 
> randomizing the initial local cm id) to get into this deeper, can we do 
> here a quick code review of the REQ matching logic? I wrote what i 
> understand below.
> 
>> static struct cm_id_private * cm_match_req(struct cm_work *work,
>> +                                          struct cm_id_private 
>> *cm_id_priv)
>> +{
>> +       struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
>> +       struct cm_timewait_info *timewait_info;
>> +       struct cm_req_msg *req_msg;
>> +       unsigned long flags;
>> +
>> +       req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
>> +
>> +       /* Check for duplicate REQ and stale connections. */
>> +       spin_lock_irqsave(&cm.lock, flags);
>> +       timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
>> +       if (!timewait_info)
>> +               timewait_info = 
>> cm_insert_remote_qpn(cm_id_priv->timewait_info);
> 
> This if() holds when <remote_id, remote_ca_guid> entry is present in 
> remote_id_table OR <remote_qpn,remote_ca_guid> entry is present in 
> remote_qpn_table
> 
>> +       if (timewait_info) {
>> +               cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
>> +                                          
>> timewait_info->work.remote_id);
>  > +               spin_unlock_irqrestore(&cm.lock, flags);
>> +               if (cur_cm_id_priv) {
>> +                       cm_dup_req_handler(work, cur_cm_id_priv);
>> +                       cm_deref_id(cur_cm_id_priv);
> 
> <local_id, remote_id> entry exists in local_id_table, looking on 
> dup_req_handler() i see it sends REP when the id is in "MRA sent" and 
> sends a STALE_CONN REJ when the id is in timewait state, else it does 
> nothing.
> 
>> +               } else
>> +                       cm_issue_rej(work->port, work->mad_recv_wc,
>> +                                    IB_CM_REJ_STALE_CONN, 
>> CM_MSG_RESPONSE_REQ,
>> +                                    NULL, 0);
> 
> what is this case? there is no <local_id,remote_id> entry but there is 
> remote <id,ca_guid> or <qpn,ca_guid> entries???
> 
>> +               goto error;
>> +       }
> 
> Or.
> 






More information about the general mailing list