[openib-general] [PATCH ] RFC IB/cm do not track remote QPN in timewait state
Michael S. Tsirkin
mst at mellanox.co.il
Wed Aug 30 10:52:16 PDT 2006
Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH ] RFC IB/cm do not track remote QPN in timewait state
>
> Michael S. Tsirkin wrote:
> > And so can RTU, in which case again QP will be in RTR. So it seems
> > lost CM packets aren't protected by timewait.
>
> Maybe we just try to deal with this the best that we can and make the HCA
> driver responsible for not re-allocating QPs for a duration of
> local_ack_timeout once they've entered RTS.
>
> If connections are made through the IB CM, it seems unlikely that stale
> packets will float around the subnet longer than it takes to establish a new
> connection. We just need to be able to detect stale connections, which
> requires that users use the CM when connecting.
Fair enough.
> > At least in case of mthca, I think we do have the last PSN.
> > So I guess we could have a special wildcard PSN value that let's low
> > level driver select it. What would a good value for the PSN be?
>
> I'm not sure about this idea now. A random starting PSN is easy. How much
> better off are we trying to be clever?
Donnu. In setups such as boot over IB, random values might be hard to get.
> Btw, have you been able to determine why your patch leads to a crash under
> stress? It looks to me like it would work.
It exposed a race in SDP. The patch itself does not lead to crashes -
I re-attach it here for reference.
As we discussed, this needs to be extended to handle DREQ retries
properly.
---
IB/cm: do not track remote QPN in TimeWait, since QP is not connected.
This avoids spurious "stale connection" rejects.
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f85c97f..e270311 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -679,6 +679,8 @@ static void cm_enter_timewait(struct cm_
{
int wait_time;
+ cm_cleanup_timewait(cm_id_priv->timewait_info);
+
/*
* The cm_id could be destroyed by the user before we exit timewait.
* To protect against this, we search for the cm_id after exiting
--
MST
More information about the general
mailing list