[openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state

Michael S. Tsirkin mst at mellanox.co.il
Wed Aug 30 10:52:16 PDT 2006


Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH ] RFC IB/cm do not track remote QPN in timewait state
> 
> Michael S. Tsirkin wrote:
> > And so can RTU, in which case again QP will be in RTR. So it seems
> > lost CM packets aren't protected by timewait.
> 
> Maybe we just try to deal with this the best that we can and make the HCA
> driver responsible for not re-allocating QPs for a duration of
> local_ack_timeout once they've entered RTS.
> 
> If connections are made through the IB CM, it seems unlikely that stale
> packets will float around the subnet longer than it takes to establish a new
> connection.  We just need to be able to detect stale connections, which
> requires that users use the CM when connecting.

Fair enough.

> > At least in case of mthca, I think we do have the last PSN.
> > So I guess we could have a special wildcard PSN value that let's low
> > level driver select it. What would a good value for the PSN be?
> 
> I'm not sure about this idea now.  A random starting PSN is easy.  How much 
> better off are we trying to be clever?

Donnu. In setups such as boot over IB, random values might be hard to get.

> Btw, have you been able to determine why your patch leads to a crash under 
> stress?  It looks to me like it would work.

It exposed a race in SDP. The patch itself does not lead to crashes -
I re-attach it here for reference.
As we discussed, this needs to be extended to handle DREQ retries
properly.

---

IB/cm: do not track remote QPN in TimeWait, since QP is not connected.
This avoids spurious "stale connection" rejects.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f85c97f..e270311 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -679,6 +679,8 @@ static void cm_enter_timewait(struct cm_
 {
 	int wait_time;
 
+	cm_cleanup_timewait(cm_id_priv->timewait_info);
+
 	/*
 	 * The cm_id could be destroyed by the user before we exit timewait.
 	 * To protect against this, we search for the cm_id after exiting

-- 
MST




More information about the general mailing list