[Ofvwg] OFVWG meeting notes - 9/8/2015

Liran Liss liranl at mellanox.com
Tue Sep 8 13:11:00 PDT 2015


The underlying model of the recent RDMA device reset patches was presented.

Reset flows are important for various use-cases, but their implementation for RDMA devices is challenging due to their stateful nature, and the fact that both kernel ULPs and user-space applications hold direct references to HW objects.

Support for kernel ULPs is provided by a new "error" state, in which devices do not process new WQEs or permit creation/modification of new resources, but flush any inflight WQEs and process Verb calls for destroying resources.
When a reset is required, the kernel provider issues a fatal event, resets the device, and places the device in error state.
It may then unregister the device, allowing all ULPs to process their standard shut-down flows.

Device resets are provider specific, and depend on whether the device is reachable.
If the device is unreachable, it is assumed that system FW will reset the device by other means (e.g., via PCI) before re-enabling access.

Support for user-space applications is provided by "zombifying" existing device instances.
Zombie devices maintain user-space references, but are not associated with HW. They persist as long as there are active handles to them, and do not permit any operation other than closing resources.
Some applications may hold references to zombie devices, while other applications hold valid handles to the same device after it recovered.

We considered allowing applications a grace period to close their resources before being detached from the device by the kernel.
However, rapid recovery time was preferred.

Providers can add support for new HW by supporting the "error" state, and implementing the disassociate_ucontext() callback.

Finally, the notion of resetting devices while maintaining the ULP SW state was discussed.
It was deemed beneficial to maintain the ULP device representation (e.g., a SCSI adapter) within the OS while transient device resets are performed.

--Liran


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofvwg/attachments/20150908/2b71f550/attachment.html>


More information about the ofvwg mailing list