[Fwd: [ofa-general] More responder_resources problems]

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Tue Apr 22 14:00:49 PDT 2008


On Tue, Apr 22, 2008 at 12:17:13PM -0700, Sean Hefty wrote:

> >I can't find any of this in any of the cm libraries - and this is the
> >sort of thing I was expecting to find in kernel cm.c, since other than
> >letting the client on the passive side specify lower limits there
> >really isn't much latitude here.
 
> The initiator_depth and responder_resources are control by the CM
> ULP, and are specified when calling send_cm_req / send_cm_rep.  The
> exchanged values are reported through the req_event/rep_event
> parameters.

Yes, but the actual programming of the values into the QP is done by
cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which
takes the values from the rep/req directly, without modification.

Look at for instance the entire stack, none of SRP, ISER or IPOIB
touch max_*_rd_atomic, they all rely on cm_init_*_attr to set them
properly. I guess these are not entierly good examples since they are
generally not acting as the passive side (I don't have the target
patches for SRP/ISER handy..)

There is a bug here, it just isn't really obvious to me where the
fixes should go to match the CM design. I was imagining that cm.c
would adjust the REQ after reception, but there may be some downsides
to that?

> The behavior that you're describing is done by the kernel cm.  Look in
> ib_send_cm_req / ib_send_cm_rep / cm_req_handler / cm_rep_handler.

All that I see in here is switching REQ's responder_resources value
into the REQ's initiator_depth value (and vice versa) it does not
limit it.

> The active side initiator_depth and responder_resources are set by
> the active side when calling ib_send_cm_req.  The passive side
> initializes its values to the data carried in the REQ.  When the
> passive sides sends a REP, it is allowed to reduce the values.  The
> CM adjusts both the passive and active side values based on the data
> in the REP.

Well, I see how the override gets into the REP, but how does the REQ
get factored into the override? For instance, the rping example does
this:

        memset(&conn_param, 0, sizeof conn_param);
        conn_param.responder_resources = 1;
        conn_param.initiator_depth = 1;
        ret = rdma_accept(cb->child_cm_id, &conn_param);

And rdma_accept does:

   ret = ucma_valid_param(id_priv, conn_param);
            [^^ Only checks local device capabilities]
   ret = ucma_modify_qp_rtr(id, conn_param);
[.. then on to ucma_modify_qp_rtr .. ]
        if (conn_param)
                qp_attr.max_dest_rd_atomic = conn_param->responder_resources;
        return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask);

Which just can't be entirely right. The client can specify values that
are greater than those specified in the REQ. Since the client doesn't
seem to have access to the REQ prior to calling rdma_accept the
responsibility to limit the values must fall on librdmacm.

Maybe something more like this in ucma_modify_qp_rtr:

if (conn_param) {
   /* Note: at this point qp_attr.max_dest_rd_atomic is
      REQ.initiator_depth. */
   conn_param->responder_resouces = min(conn_param->responder_resouces,
                                        qp_attr.max_dest_rd_atomic,
                                        id_priv->cma_dev->max_responder_resources);
   qp_attr.max_dest_rd_atomic = conn_param->responder_resouces;

   /* Note: at this point qp_attr.max_rd_atomic is
      REQ.responder_resources. */
   conn_param->initiator_depth = min(conn_param->initiator_depth,
                                        qp_attr.max_rd_atomic,
                                        id_priv->cma_dev->max_initiator_depth);
   qp_attr.max_rd_atomic = conn_param->initiator_depth;
}

ie, consider the REQ values as reported through rdma_init_qp_attr,
and limit the user's requested values on the passive side to be no
greater than what the remote can do/

Also support user passive side control over initiator depth.

A similar kind of problem exists in the normal CM.

Thanks,
Jason



More information about the general mailing list