[openib-general] Re: RFD: uverbs and hotplug

Wed May 4 11:14:25 PDT 2005

On 5/4/05, Michael Krause <krause at cup.hp.com> wrote:
>  At 04:44 AM 5/4/2005, Michael S. Tsirkin wrote:
> 
>  
>  
> The problem is not that hard. The driver must provide sufficient
>  information to discover that the device went down and which requests
>  were serviced before the device went down.
> 
>  Libraries such as MPI will be able to use that to resend these requests on
>  a different HCA or after an HCA is plugged back in.
> 
>  Applications will simply use this library and be happy.
>  The operating assumption here is that applications or middleware can simply
> restart.  That is a nice theory but not always practical especially when
> taking the service usage model.  Some applications seeing failure may deem a
> cluster partitioning event has occurred and initiate membership quorum, etc.
> algorithms.  Others might invoke DR algorithms and fail over to alternate
> systems or sites.  This problem has been worked for many years by various
> vendors and it still isn't fully solved.  Do not think that simple resource
> recovery and restart is an acceptable answer to customers.  It is the
> minimum any subsystem or device should support but that does not translate
> automatically into a viable solution.
> 
>  Mike 

And even if the application properly diagnoses the problems and does not
falsely take any of the more draconian solutions outlined, we still need
to recognize that "just restarting" is a much bigger problem for an
RDMA application than it is for a socket application.

If the HCA/RNIC closes, then the application has to do more than 
"just" reopen it and re-establish the connection. It has to recreate
the PDs, CQs, SRQs, MRs, MWs and QPs. And *after* the
connections are re-established any persistent buffer adverisements
must be re-exchanged.

Avoiding imposing that heavy of a burden whenever possible strikes
me as a worthwhile objective. At the minimum it needs to be clear
on the burden that is being created if suspend/resume support is
not included. It is not "just" a matter of reopeing and reconnecting.