[Fwd: [ofa-general] More responder_resources problems]

Tue Apr 22 17:47:39 PDT 2008

On Tue, Apr 22, 2008 at 03:23:30PM -0700, Sean Hefty wrote:
> >Yes, but the actual programming of the values into the QP is done by
> >cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which
> >takes the values from the rep/req directly, without modification.
> 
> The values exchanged in the REP are saved to cm_id_priv.  Those values are used.
> The passive side ULP is responsible for using the correct value.  Either by
> returning what was sent in the REQ, or by adjusting the values down.  Note that
> the active side will see the values in the REP and can reject the connection if
> they are set too large.

Ok.. Well, if the ULP is responsible, I have yet to see a ULP example,
or in-kernel ULP that does it right. Every one ignores the REQ and/or does
not limit the REQ's values to the devices capabilities.

The other view is that the CM should just handle this and the ULP
should only have the option to further reduce the value. It is not a
parameter that affects the operation of the ULP, so having it be
lowered is not significant. The actual value can always be queried
with ibv_query_qp.

I guess that is really what it comes down to, which do you think
should be primarily responsible for this, and what should the API
be. I can't disagree with you that the ULP should be responsible given
the CM API, but that doesn't make it less awkward and annoying....

> >There is a bug here, it just isn't really obvious to me where the
> >fixes should go to match the CM design. I was imagining that cm.c
> >would adjust the REQ after reception, but there may be some downsides
> >to that?
> 
> The CM does adjust the value in the cm_id_priv structure based on the REP.

Right, but I'm talking about when the passive side generates the
REP. The contents of the REP should exactly match what the passive
side QP is set to (ie lower than the device capabilities), and always
be lower than the values in the REQ.

> >All that I see in here is switching REQ's responder_resources value
> >into the REQ's initiator_depth value (and vice versa) it does not
> >limit it.
> 
> The limits are left up to the ULP.  Maybe the problem is that the ULPs are not
> validating the limits?

That is definately true.

> This is a sanity check only, intended to help catch errors sooner.  Since it is
> also used on the active side before sending a REQ, it can only check against the
> local device capabilities.  The sanity check could be expanded, but I don't see
> a strong reason to add it.  The modify QP operations will fail later if the
> specified values are too large.

But the whole point of this process is to get a working connection -
the responder resources are not a ULP visible item, they are just
something that must be negotiated and configured into the QP. In
truth, I can think of no reason for a ULP to use any value other than
the device maximum or 0 for these resources. Saying that if the
passive side messes up it will just die when the QP is modified is,
IMHO, not good enough.

> Yes, the client can specify values that were greater than those in
> the REQ, but those values may technically still work.

I don't see how? The active side may be unable to program the QP to
those values, and using an initiator_depth larger than the peers
responder_resources will cause operational problems. The way the spec
is written it is pretty much mandatory to limit to the values in the
REQ when generating the REP.

It would be perfectly conformant (and a good idea) for the active side
to refuse to use a REP with larger values than its REQ.

> >ie, consider the REQ values as reported through rdma_init_qp_attr,
> >and limit the user's requested values on the passive side to be no
> >greater than what the remote can do/
> 
> I don't like the idea of reducing the limits without the user's knowledge.  I
> would rather fail the connection, which is what happens today (either through
> the ucma_valid_param() checks or when modifying the QP).

That is not entirely true, since the passive side's change overrides
the values in the REQ from the active side, which can reduce the value
without the users knowledge. The question really is if you expect the
CM to control this for you, or if you expect the ULP do do everything
manually. Right now there seems to be a bit of both going on.

> >Also support user passive side control over initiator depth.
> 
> This is there today.

Where? cma.c never programs max_rd_atomic in the qp.

> I think I'm missing seeing whatever problem you're seeing.

Well, what I have been interested in (Hal - what is your interest
here?) is to use the device maximum and get rid of the hard coded
values for responder resources and initiator depth in the ULPs. This
would be to allow some devices to have higher responder resources,
based on hardware capabilitity. Limited responder resources cause huge
performance problems on high latency connections.

In the process I have observed that the spec is not being followed and
there are cases where things go wrong if the two sides are not
requesting identical things. I've also observed that the examples
examples of how to use CM and RDMACM do not include the correct
behavior.

-- 
Jason Gunthorpe <jgunthorpe at obsidianresearch.com>        (780)4406067x832
Chief Technology Officer, Obsidian Research Corp         Edmonton, Canada