[openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

Mon Aug 28 13:41:18 PDT 2006

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH ] RFC IB/cm do not track remote QPN in timewait state
> 
> Michael S. Tsirkin wrote:
> >>The CM tracks the remote QP, not the local.
> > 
> > 
> > I might not have been clear.
> > For connection in timewait state, spec explicitly says local QP
> > must be in reset, error or init.
> > Only after it goes out of timewait can you destroy the QP.
> > That's the tracking I think spec means CM needs to do.
> 
> I believe that this tracking is done, and is reported to the user by the 
> timewait exit event.  QP transitions are the responsibility of the user.
> 
> This is related to a problem that Arlin and I have been discussing.  There's 
> nothing that the CM does to prevent the QP from being destroyed, especially for 
> a usermode application.  The CM invokes a callback once a connection exits 
> timewait, indicating to the user that the QP may now be destroyed.  But if an 
> application crashes, uverbs automatically destroys the QP.
> 
> We may need better coordination between the CM and verbs wrt timewait to handle 
> userspace QPs, but this depends on this change.
> 
> >>The spec (12.4) seems to state
> >>that this is required by the CM.
> > 
> > 
> > Tracking, yes. But the not rejecting connections.
> 
> Section 12.4 indicates that the CM shall put both the local and remote QPNs into 
> timewait.  I was assuming that the remote QPN was tracked, in part, for 
> rejecting a stale connection.  I can see where it would only be needed to 
> validate repeated DREQs, which carry the remote QPN.

I believe communication id should be checked to detect duplicates. Right?
Remote QPN stale connection rule is only to avoid a case where we keep
connection in established state forever if the remote side rebooted.

> >> Do you disagree with my interpretation of
> >>12.4, or why do you think that this is obviously not what the spec intends?
> > 
> > 
> > It seems I disagree with your interpretation of the spec.
> > I think what the spec intends is CM must track 2 kinds of QPs:
> > 1. remote QPN used in Connected QPs - to detect stale connections
> > 2. Local QPs in timewait state - QP must not be destroyed immediately, but must
> >    stay in reset, error or init so that harware discards timewait packets
> > 
> > These 2 are mutually exclusive.
> > In case 1 a new REQ with same remote QPN means connection got stale.
> > In case 2 we exchanged DREQ/DREP so there's no issue.
> > 
> 
>  From 12.9.7.1 and 12.9.7.2, there's no action indicated that the CM should take 
> when receiving a REQ when in timewait.

But if the ID that REQ uses is not in timewait the usual rules apply.

> A stale connection check is explicitly 
> listed under the established state.  This may help clarify stale connections.

So we agree stale connection rule only applies if connection is in established
state?

-- 
MST