[openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

Sean Hefty mshefty at ichips.intel.com
Mon Aug 28 13:29:32 PDT 2006


Michael S. Tsirkin wrote:
>>The CM tracks the remote QP, not the local.
> 
> 
> I might not have been clear.
> For connection in timewait state, spec explicitly says local QP
> must be in reset, error or init.
> Only after it goes out of timewait can you destroy the QP.
> That's the tracking I think spec means CM needs to do.

I believe that this tracking is done, and is reported to the user by the 
timewait exit event.  QP transitions are the responsibility of the user.

This is related to a problem that Arlin and I have been discussing.  There's 
nothing that the CM does to prevent the QP from being destroyed, especially for 
a usermode application.  The CM invokes a callback once a connection exits 
timewait, indicating to the user that the QP may now be destroyed.  But if an 
application crashes, uverbs automatically destroys the QP.

We may need better coordination between the CM and verbs wrt timewait to handle 
userspace QPs, but this depends on this change.

>>The spec (12.4) seems to state
>>that this is required by the CM.
> 
> 
> Tracking, yes. But the not rejecting connections.

Section 12.4 indicates that the CM shall put both the local and remote QPNs into 
timewait.  I was assuming that the remote QPN was tracked, in part, for 
rejecting a stale connection.  I can see where it would only be needed to 
validate repeated DREQs, which carry the remote QPN.

>> Do you disagree with my interpretation of
>>12.4, or why do you think that this is obviously not what the spec intends?
> 
> 
> It seems I disagree with your interpretation of the spec.
> I think what the spec intends is CM must track 2 kinds of QPs:
> 1. remote QPN used in Connected QPs - to detect stale connections
> 2. Local QPs in timewait state - QP must not be destroyed immediately, but must
>    stay in reset, error or init so that harware discards timewait packets
> 
> These 2 are mutually exclusive.
> In case 1 a new REQ with same remote QPN means connection got stale.
> In case 2 we exchanged DREQ/DREP so there's no issue.
> 

 From 12.9.7.1 and 12.9.7.2, there's no action indicated that the CM should take 
when receiving a REQ when in timewait.  A stale connection check is explicitly 
listed under the established state.  This may help clarify stale connections.

- Sean




More information about the general mailing list