[ofa-general] IB/cm: bug in stale connection detection logic?

Michael S. Tsirkin mst at dev.mellanox.co.il
Mon May 21 13:40:41 PDT 2007


> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] IB/cm: bug in stale connection detection logic?
> 
> >Why is an extra call to cm_get_id required to detect a duplicate?
> >Shouldn't we just
> >
> >     timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
> >	if (timewait_info) {
> >		/* handle duplicate */
> >		return;
> >	}
> >
> >	timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
> >	if (timewait_info) {
> >		/* handle stale */
> >		return;
> >	}
> >
> >	not a duplicate and not a stale connection
> 
> After looking at this more, I think we want something structured closer 
> to what's listed above, with the duplicate handling enhanced to check 
> that the QPN in the potential duplicate REQ matches what's already 
> associated with the remote ID.

Yes, that's what I thought too.

> Did you hit into an actual problem with the current code?  It seems like 
> the only issue is that a possible stale request would timeout, rather 
> then be immediately rejected.  If so, I will queue up a patch for 2.6.23.

Exactly. This is a serious problem for IPoIB CM since packet drop rates
and recovery times go up radically: sockets get closed, etc.
With a reject we would just retry connecting on the next packet.

Could you please post a patch? Let's discuss whether it's appropriate
for 2.6.22 separately.

-- 
MST



More information about the general mailing list