[openib-general] Question about QP's in timewait state and CM stale conn rejects
Sean Hefty
mshefty at ichips.intel.com
Thu Aug 17 10:18:20 PDT 2006
Or Gerlitz wrote:
> If you don't mind (also related to the patch you have sent Eric of
> randomizing the initial local cm id) to get into this deeper, can we do
There's an issue trying to randomize the initial local CM ID. The way the IDR
works, if you start at a high value, then the IDR size grows up to the size of
the first value, which can result in memory allocation failures. In my tests,
using a random value would frequently result in connection failures because of
low memory.
My conclusion is that the local ID assignment in the IB CM needs to be reworked,
or we will run into a condition that after X number of connections have been
established, we will be unable to create any new connections, even if the
previous connections have all been destroyed.
>> static struct cm_id_private * cm_match_req(struct cm_work *work,
>> + struct cm_id_private
>> *cm_id_priv)
>> +{
>> + struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
>> + struct cm_timewait_info *timewait_info;
>> + struct cm_req_msg *req_msg;
>> + unsigned long flags;
>> +
>> + req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
>> +
>> + /* Check for duplicate REQ and stale connections. */
>> + spin_lock_irqsave(&cm.lock, flags);
>> + timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
>> + if (!timewait_info)
>> + timewait_info =
>> cm_insert_remote_qpn(cm_id_priv->timewait_info);
>
>
> This if() holds when <remote_id, remote_ca_guid> entry is present in
> remote_id_table OR <remote_qpn,remote_ca_guid> entry is present in
> remote_qpn_table
correct
>
>> + if (timewait_info) {
>> + cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
>> +
>> timewait_info->work.remote_id);
>
> > + spin_unlock_irqrestore(&cm.lock, flags);
>
>> + if (cur_cm_id_priv) {
>> + cm_dup_req_handler(work, cur_cm_id_priv);
>> + cm_deref_id(cur_cm_id_priv);
>
>
> <local_id, remote_id> entry exists in local_id_table, looking on
> dup_req_handler() i see it sends REP when the id is in "MRA sent" and
> sends a STALE_CONN REJ when the id is in timewait state, else it does
> nothing.
It sends an MRA if in the MRA sent state, or a reject as indicated.
>> + } else
>> + cm_issue_rej(work->port, work->mad_recv_wc,
>> + IB_CM_REJ_STALE_CONN,
>> CM_MSG_RESPONSE_REQ,
>> + NULL, 0);
>
>
> what is this case? there is no <local_id,remote_id> entry but there is
> remote <id,ca_guid> or <qpn,ca_guid> entries???
If we get here, this means that the REQ was a new REQ and not a duplicate, but
the remote_id or remote_qpn is already in use. We need to reject the new REQ as
containing stale data.
- Sean
More information about the general
mailing list