[ofw] RE: When one can release CA interface

Smith, Stan stan.smith at intel.com
Wed Sep 2 14:57:17 PDT 2009


>> Leonid Keller wrote:
>>> I don't quite understand that.
>>> If CA has been deregistered, i.e. ib_deregister_ca() has been
>>> called, there is more CI_CA object. All resources (PDs, CQs, QPs)
>>> are released.
>>> How can IBAL send to CA MADs ?
>>
>> The CI_CA object still exists due to the reference still held
>> on the HCA.
> Did you see that for real ?

Infrequently during IBAL cleanup, an IBAL MAD thread would attempt to forward a MAD and blow up as the HCA had been shutdown underneath it (*dev == NULL in HCA driver and ensuing dereference crashes system).

A similar situation exists in the shutdown power path; Power IRP is passed dwon from the bus driver to HCA without informing IBAL the HCA is shutting down. Occcationally an IBAL MAD thread will be processing a MAD and attempt to forward/post-send the MAD. The HCA driver has shutdown such that *dev == NULL and eventually the driver dereferences *dev with a following crash.

>
> Technically, CI_CA is created as a synchronic object (see in
> create_ci_ca)
>
>       init_al_obj( &p_ci_ca->obj, p_ci_ca, FALSE,
>               destroying_ci_ca, cleanup_ci_ca, free_ci_ca );
>
> and when ib_deregister_ca() exits, CI_CA should be fully destroyed.
>
> From the design point of view, deregister_ca is a function of IBAL low
> interface.
> It is to be called by HCA low-level driver upon card ejecting.
> In other words, there is no HCA card after returning from this
> function !
>
> Am I missing something ?

Even though ib_deregister_ca() has returned successfully, the HCA may or may not be fully removed as the object unwind/destroy routines are driven by the object reference count and may not have completed themselves.

The CA object functions destroying_ci_ca(), cleanup_ci_ca() and free_ci_ca() routines are not called until the CA object reference count goes to zero. In the bus driver by holding the last reference until after ibal cleanup returns forces the HCA object and HCA interfaces to remain functional until after IBAL cleanup even though ib_deregister_ca() has returned.
This is my understanding.

>
>
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Smith, Stan [mailto:stan.smith at intel.com]
>>>> Sent: Tuesday, September 01, 2009 7:29 PM
>>>> To: Leonid Keller; Tzachi Dar
>>>> Cc: ofw_list
>>>> Subject: RE: When one can release CA interface
>>>>
>>>> Leonid Keller wrote:
>>>>> fdo_release_resources() in bus_pnp.c releases the last interface
>>>>> with low-level driver only after IBAL cleanup with the following
>>>>> explanation:
>>>>>
>>>>> /* AL needs the HCA to stick around until AL cleanup has
>>>>> completed. ... */
>>>>>
>>>>> My question is - why ?
>>>>> How may/can IBAL proceed to work with HCA after CA has been
>>>>> deregistered ? What kind of works could be still pending ?
>>>>
>>>>
>>>> IBAL MAD processing threads may be in process of forwarding a MAD
>>>> and/or MAD processing with the outcome of eventually accessing the
>>>> HCA device. If the HCA is remove before AL shutdown.....boom!




More information about the ofw mailing list