[openib-general] Question about QP's in timewait state and CM stale conn rejects

Or Gerlitz ogerlitz at voltaire.com
Thu Aug 17 00:59:10 PDT 2006


Arlin Davis wrote:
> We are running into connection reject issues (IB_CM_REJ_STALE_CONN) with 
> our application under heavy load and lots of connections.
> 
> We occassionally get a reject based on the QP being in timewait state 
> leftover from a prior connection. It appears that the CM keeps track of 
> the QP's in timewait state on both sides of the connection, 

How did you verify that? the CM generated REJ with IB_CM_REJ_STALE_CONN 
in two flows for the passive side (ie rejecting a REQ) and one flow for 
the active side (ie rejecting a REP).

> How can a consumer know for sure that the new QP will not be in a 
> timewait state according to the CM? Does it make sense to push the 
> timewait functionality down into verbs? If not, is there a way for the 
> CM to hold a reference to the QP until the timewait expires?

Just to emphasize what Sean has pointed out, you are asking how can a CM 
consumer know that a **local** QPN is not in the timewait state 
according to the **remote** CM. Since the issue is with the remote CM, 
it seems to me that pushing down timewait into verbs is not the correct 
direction to look at.

Or.





More information about the general mailing list