[ofw] RE: HCA Soft Reset mechanism

Leonid Keller leonid at mellanox.co.il
Mon Aug 4 11:26:16 PDT 2008


 see inline

> -----Original Message-----
> From: Sean Hefty [mailto:sean.hefty at intel.com] 
> Sent: Monday, August 04, 2008 9:08 PM
> To: Leonid Keller; ofw at lists.openfabrics.org
> Subject: RE: HCA Soft Reset mechanism
> 
> >Soft Reset here is HCA re-initialization without bus driver 
> reloading.
> >A reset can be initiated by clients (mlx4_eth, mlx4_hca) 
> and/or driver 
> >(mlx4_bus).
> >Driver issues reset upon card fatal error, which prevents 
> the following 
> >work with the card.
> >Clients may request the reset at any moment upon their will.
> 
> From the perspective of a client, is the behavior any 
> different than a remove device followed by an add device?

No.

> 
> >Clients have to register event callback after getting bus interface.
> >
> >When a reset event comes, the bus driver will:
> >   - bar the following work with card, returning -EFAULT to all, but 
> >destroy_xx, commands;
> >   - reset the card to stop incoming traffic (only in case 
> of client- 
> >initiated reset);
> >   - notify all registered clients about pending reset.
> 
> 
> >Getting this notification clients have to:
> >   - wait for all issued commands to end;
> >   - reset its own clients, if any, and bar their work;
> >   - release all the device resources, they were using till now;
> >   - send "I'm reset-ready" notification to the bus driver;
> 
> How does this synchronize with a user unloading the driver or 
> the entire stack?

A good point.
There is no special synchronization.
But there is also no syncronization between data transfer and PnP events
like driver unload.

> 
> Also, this is a fairly difficult operation to pull off in an 
> efficient manner.
> 
> >The driver starts to perform device reset only after 
> receiving the "I'm 
> >reset-ready" notifications from all the registered clients. It re- 
> >initializes the device and notifies all the clients.
> >
> >Having received this notification, clients have to:
> >   - dereference the old bus interface;
> >   - get the new interface from bus driver;
> >   - register new event handler;
> >   - resume/restart itself;
> >   - wake up its own clients, if any;
> 
> Why do clients need to get new interfaces or re-register 
> event handlers?

Because ib_device was re-created and it is the main parameter of the
interface.

> 
> We should tie this sort of operation in with standard Windows 
> PnP operations.  It really sounds like out of band PnP 
> add/remove device to me.

It is really a usual way to handle serious HW problems.
Say network adapter reveals, that the card doesn't work in an expected
manner.
It notifies NDIS via CheckForHang function, that it wants to be reset.
NDIS calls network adapter' Reset function to initialize HW and solve
this way the problem.

> 
> - Sean
> 
> 



More information about the ofw mailing list