[ofw] RE: HCA Soft Reset mechanism

Sean Hefty sean.hefty at intel.com
Mon Aug 4 11:44:43 PDT 2008


>> From the perspective of a client, is the behavior any
>> different than a remove device followed by an add device?
>
>No.

Maybe we can leverage the work in this area then...

>> How does this synchronize with a user unloading the driver or
>> the entire stack?
>
>A good point.
>There is no special synchronization.
>But there is also no syncronization between data transfer and PnP
>events
>like driver unload.

Synchronization between data transfers and driver unload is handled by
the stack from the top down.  By the time the driver unload reaches the
driver at the bottom of the stack, everything above it will have stopped
initiating data transfers and released all resources.

The only special handling that ends up being needed are drivers that
export userspace interfaces.  Since userspace can't be trusted to stop
using the hardware, those drivers need the functionality that you're
describing.  They need to block additional userspace access and cleanup
any hardware resources.  This is non-trivial to handle, and we want to
avoid putting this sort of complexity in every driver in the stack.

Is there any way to issue standard PnP add/remove device, or maybe power
off/on, or a custom PnP event?  My concern is that the upper level
drivers have something easier to key off of, rather than callbacks which
could occur simultaneous with other PnP events.

>> Why do clients need to get new interfaces or re-register
>> event handlers?
>
>Because ib_device was re-created and it is the main parameter of the
>interface.

I think I'm missing something.  How do the drivers know when to
re-register, or when the hardware can be used again?

>It is really a usual way to handle serious HW problems.
>Say network adapter reveals, that the card doesn't work in an expected
>manner.
>It notifies NDIS via CheckForHang function, that it wants to be reset.
>NDIS calls network adapter' Reset function to initialize HW and solve
>this way the problem.

I agree with the functionality.  I'm just trying to figure out the
impact on the stack.  For mlx4, it seems like the bus driver could
remove the PDO, wait for the device removal to complete, reset the
adapter, then add back the PDO.  Maybe there's more to it than this, but
at least on the surface, this seems easy and works without changes to
the rest of the stack.  I'm not sure how you'd handle mthca though, but
maybe getting this to work for mthca isn't as important...?

- Sean




More information about the ofw mailing list