[openib-general] RFC: mthca: implement timewait by tracking QPNs

Michael S. Tsirkin mst at mellanox.co.il
Mon Sep 11 16:54:46 PDT 2006


Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: RFC: mthca: implement timewait by tracking QPNs
> 
> My gut reaction is that it seems pretty ugly.

Hmm. All of it or just some bits?


> I guess we'll also need
> similar patches for ipath and ehca too -- which makes me want to have
> this in common code somehow.

Could be a library function in core so that ipath etc can reuse it.
But note how there's no dependency between drivers here - no
reason to block change in mthca until ipath/ehca implement this functionality,
too.

> Also timewait is really only part of the CM spec

Not entirely corect. Please look at 9.7.1 - search for "stale packets":

    In addition to duplicate packets and invalid packets, there is a third
    condition, called a Stale Packet (.TIME WAIT packet.). If a connection to a
    responder is torn down and a new connection is established while packets are in
    flight, a packet from the old (stale) connection may arrive at the responder.
    
    The responder, in turn, may interpret this stale incoming packet
    as a valid packet, when in fact it is a remnant of a previous connection.
    
    There are no transport layer mechanisms to guard against this condition;
    it is the responsibility of connection management to avoid re-using QPs
    until there is no possibility that a stale packet could arrive at the responder.
    This is done by placing the requester and responder QPs in a .Time Wait.
    state long enough to ensure that any stale packets left in the fabric have
    expired before re-using those QPs.

So the spec suggests that timewait be implemented in CM, but timewait
is needed to solve a problem that affects the transport layer and that
is described in Chapter 9.

> -- do we want to
> limit the rate of RC QP creation in general for potential non-CM users
> that know what they're doing?

I don't see how this limits the rate of QP creation. Could you explain?

Second, there's no way I can see verbs user can check there no stale packets
(AK TimeWait packets). Is there? So user only *thinks* he knows what he's
doing, meanwhile getting silen data corruption. Correct?

> I'm not sure the following is a real concern (since a hostile user can
> currently just create a ton of QPs and hold onto them forever), but
> this also allows someone to create a bunch of QPs with a super-long
> timeout and prevent any other QPs from being created for a few hours
> (until the timewait expires).

Another reason why this might not be an issue is that the QPN space
is reasonably big - 2^24. I guess when we start looking at limiting
#of QPs per user, we'll need to limit the max legal packet lifetime too.
Might be a good idea anyway.

> Finally one implementation comment: I think you'll want a list in
> addition to QPN + timer, to allow the ib_mthca module to be unloaded
> without having to wait an hour for all timers to expire.  This allows
> timewait to be bypassed by unloading + reloading but that's no
> different than rebooting really.

Sure, that's obvious.

> Another good prophylactic measure would probably to randomize initial
> PSNs for RC connections.  SRP currently does this.

I agree this also helps.

-- 
MST




More information about the general mailing list