[ofw] IBAL CM ignores responder resources of the passive side

Fab Tillier ftillier at microsoft.com
Mon Jun 27 09:06:42 PDT 2011


Leonid Keller wrote on Mon, 27 Jun 2011 at 00:54:40

> We did the below patch and got instantly the assert.

Running which ULP in WinOFED?  Looking at the code, the ND provider does the right thing.  WSD seems to not set resp_res and I'd say that was a bug.  Since the WSD provider gets updated with any driver update, there's little chance of users running into this if both get fixed concurrently.

> Which brought me to understanding, that assert should be removed, but 'if'
> should be kept.

I think it's important to allow the CEP Manager to support 0 as a valid value, indicating that no RDMA reads are supported.  I would keep the assert and remove the if.  If you had a different value to indicate "auto-negotiate" you could have the proxy trap for zero and convert before passing down to the CEP manager.

> Why?
> Because the active side do not look at ReponderResources and InitDepth of
> the passive side (sent in REP).
> It allows CM to set all the parameters.
> The old code gave to the passive side a possibility not to negotiate
> parameters, but to leave it to CM.
> And I think, that we should keep this possibility.

Both the active and passive sides should support auto-negotiating, but it shouldn't be done using the value 0.  

I think clients should be able to specify whatever values they desire, and the CM should then cap to the abilities of the hardware/negotiate on behalf of the client.  The actual values can always be examined during connection establishment (when retrieving the QP attributes for QP transitions) or queried from the QP.

If an application knows that it could have up to 8 RDMA reads in flight at a time, it could always use 8 as the value.  If the hardware doesn't support 8, it would cap it to whatever it does support.  The client would only get an error if the underlying hardware had a limit of zero (unable to do RDMA reads), though I'm not sure that's even allowed by the IB standard.

-Fab

>> -----Original Message-----
>> From: Leonid Keller
>> Sent: Thursday, June 23, 2011 3:08 PM
>> To: 'Fab Tillier'; Hefty, Sean
>> Cc: ofw_list
>> Subject: RE: IBAL CM ignores responder resources of the passive side
>> 
>> I'd say, it's not "broken" applications, but applications, that do not
>> care what Responder Resources should be. And if an application in a
>> third company works with previous drivers on 1000 computers and now
>> they get new drivers and install them on 1000 computers and suddenly
>> the application fails ...
>> 
>> I'm going to add in our code the following lines:
>> 
>>  ASSERT(p_cm_rep->resp_res);
>>  if (p_cm_rep->resp_res)
>>  	p_cep->resp_res = p_cm_rep->resp_res;
>> You may think of leaving in the OFED code only the last line.
>> 
>> 
>>> -----Original Message-----
>>> From: Fab Tillier [mailto:ftillier at microsoft.com]
>>> Sent: Thursday, June 23, 2011 2:53 AM
>>> To: Leonid Keller; Hefty, Sean
>>> Cc: ofw_list
>>> Subject: RE: IBAL CM ignores responder resources of the passive side
>>> 
>>> I'd just let existing broken apps get fixed and fix it in the CM
>>> properly.  That is, do the straight assignment, not the conditional
>>> assignment.
>>> 
>>> Cheers,
>>> -Fab
>>> 
>>> -----Original Message-----
>>> From: Leonid Keller [mailto:leonid at mellanox.co.il]
>>> Sent: Wednesday, June 22, 2011 10:03 AM
>>> To: Hefty, Sean; Fab Tillier
>>> Cc: ofw_list
>>> Subject: RE: IBAL CM ignores responder resources of the passive side
>>> 
>>> I'm afraid that just
>>> 
>>> p_cep->resp_res = p_cm_rep->resp_res;
>>> 
>>> can break some existing applications, which do not fill p_cm_rep-
>>>> resp_res and still work.
>>> Maybe
>>> 
>>> if (p_cm_rep->resp_res)
>>> 	p_cep->resp_res = p_cm_rep->resp_res;
>>> 
>>>> -----Original Message-----
>>>> From: Hefty, Sean [mailto:sean.hefty at intel.com]
>>>> Sent: Wednesday, June 22, 2011 7:04 PM
>>>> To: Leonid Keller; Fab Tillier
>>>> Cc: ofw_list
>>>> Subject: RE: IBAL CM ignores responder resources of the passive side
>>>> 
>>>> without looking at more of the code, it looks like a bug
>>>> 
>>>> 
>>>>> __save_user_rep(
>>>>> 
>>>>>                 IN
>>>>> cep_agent_t* const p_port_cep,
>>>>> 
>>>>>                 IN
>>>>> kcep_t* const p_cep,
>>>>> 
>>>>>                 IN                           const     iba_cm_rep*
>>>> const
>>>>> p_cm_rep,
>>>>> 
>>>>>                 IN
>>>>> uint8_t
>>>>> rnr_nak_timeout )
>>>>> 
>>>>> {
>>>>> 
>>>>>                 AL_ENTER( AL_DBG_CM );
>>>>> 
>>>>> 
>>>>>                 p_cep->local_qpn = p_cm_rep->qpn;
>>>>>                 
>>>>>                 p_cep->rq_psn = p_cm_rep->starting_psn;
>>>>>                 
>>>>>                 p_cep->init_depth = p_cm_rep->init_depth;
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> I think we can just always do the assignment:
>>>> 
>>>> 	p_cep->resp_res = p_cm_rep->resp_res;
>>>> 
>>>> here.  Is the p_cep->resp_res the value that actually goes into the
>>>> REP?
>>>> 
>>>> 
>>>>>                 ci_ca_lock_attr( p_port_cep->h_ca->obj.p_ci_ca );
>>>>>                 
>>>>>                 /* Check the CA's responder resource max and trim
>>> if
>>>>> necessary. */
>>>>> 
>>>>>                 if( p_port_cep->h_ca->obj.p_ci_ca->p_pnp_attr-
>>>>>> max_qp_resp_res <
>>>> 
>>>> 
>>>> whine: The max_qp_resp_res isn't going to dynamically change.  We
>>>> should use a saved value somewhere that doesn't require serializing
>>>> all CM users for this check.
>>>> 
>>>> - Sean
>



More information about the ofw mailing list