[openib-general] RFC: mthca: implement timewait by tracking QPNs
Michael S. Tsirkin
mst at mellanox.co.il
Mon Sep 11 16:54:46 PDT 2006
Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: RFC: mthca: implement timewait by tracking QPNs
>
> My gut reaction is that it seems pretty ugly.
Hmm. All of it or just some bits?
> I guess we'll also need
> similar patches for ipath and ehca too -- which makes me want to have
> this in common code somehow.
Could be a library function in core so that ipath etc can reuse it.
But note how there's no dependency between drivers here - no
reason to block change in mthca until ipath/ehca implement this functionality,
too.
> Also timewait is really only part of the CM spec
Not entirely corect. Please look at 9.7.1 - search for "stale packets":
In addition to duplicate packets and invalid packets, there is a third
condition, called a Stale Packet (.TIME WAIT packet.). If a connection to a
responder is torn down and a new connection is established while packets are in
flight, a packet from the old (stale) connection may arrive at the responder.
The responder, in turn, may interpret this stale incoming packet
as a valid packet, when in fact it is a remnant of a previous connection.
There are no transport layer mechanisms to guard against this condition;
it is the responsibility of connection management to avoid re-using QPs
until there is no possibility that a stale packet could arrive at the responder.
This is done by placing the requester and responder QPs in a .Time Wait.
state long enough to ensure that any stale packets left in the fabric have
expired before re-using those QPs.
So the spec suggests that timewait be implemented in CM, but timewait
is needed to solve a problem that affects the transport layer and that
is described in Chapter 9.
> -- do we want to
> limit the rate of RC QP creation in general for potential non-CM users
> that know what they're doing?
I don't see how this limits the rate of QP creation. Could you explain?
Second, there's no way I can see verbs user can check there no stale packets
(AK TimeWait packets). Is there? So user only *thinks* he knows what he's
doing, meanwhile getting silen data corruption. Correct?
> I'm not sure the following is a real concern (since a hostile user can
> currently just create a ton of QPs and hold onto them forever), but
> this also allows someone to create a bunch of QPs with a super-long
> timeout and prevent any other QPs from being created for a few hours
> (until the timewait expires).
Another reason why this might not be an issue is that the QPN space
is reasonably big - 2^24. I guess when we start looking at limiting
#of QPs per user, we'll need to limit the max legal packet lifetime too.
Might be a good idea anyway.
> Finally one implementation comment: I think you'll want a list in
> addition to QPN + timer, to allow the ib_mthca module to be unloaded
> without having to wait an hour for all timers to expire. This allows
> timewait to be bypassed by unloading + reloading but that's no
> different than rebooting really.
Sure, that's obvious.
> Another good prophylactic measure would probably to randomize initial
> PSNs for RC connections. SRP currently does this.
I agree this also helps.
--
MST
More information about the general
mailing list