[openib-general] RFD: uverbs and hotplug

Michael Krause krause at cup.hp.com
Wed May 4 08:42:45 PDT 2005


This string is somewhat missing the point:

If a HCA is being used for a vital application service, then one needs an 
infrastructure that:

(1) Enables the admin to identify whether a service will be impacted and to 
what extent
         - This is a very hard problem if the device acts as a consolidated 
service point
(2) Enables the admin to migrate services to alternative devices if available
(3) Enables the admin to prevent migration if no alternative is present
(4) Translates attention button to notification
(5) Automates much of the above since actual people are expensive
(6) Enables remote service interaction points to recognize the migration 
and either adapt or clean up


BTW, whether an application is written to deal with surprise remove isn't 
really the issue.  All applications and kernel must deal with 
failures.  How graceful and how transparent to the end user is what matters.

Hence, The above is not unique to IB or really any device.  It is an OS 
problem and what is required is an OS driven strategy that can support 
multiple device types and usage paradigms.

Mike




At 08:53 PM 5/3/2005, Caitlin Bestler wrote:
>On 5/3/05, Tom Duffy <tduffy at sun.com> wrote:
> > On Tue, 2005-05-03 at 09:42 -0700, Caitlin Bestler wrote:
> > > Most applications are not written to be able to
> > > survice a sudden loss of virtually all of its connections, and then 
> resume
> > > all of the suspended sessions with new connections using new memory
> > > regions.
> >
> > Most applications *should* be written this way.  Especially since more
> > and more people are using unreliable networks (wifi, 3G, etc.).  I can
> > see a day where we have RDMA-capable wireless adapters in which case
> > applications will have to be written with the possibility of a sudden
> > loss taken into account.
> >
> > -tduffy
> >
>And applications should be prepared for file systems to fail as well,
>but that doesn't mean that the file system shouldn't consider features
>such as journaling that minimize how often the application will be
>faced that sort of problem.
>
>It would certainly be valid to decide that the effort was not worthwhile
>given the effort required. But such a decision should be made with
>full awareness that the notice will enable very few applications to
>actually recover when an entire HCA is taken down. If the goal
>is to realistically allow applications to confinue functioning after
>the loss of multiple ports some more powerful hooks are required.
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050504/be1309fc/attachment.html>


More information about the general mailing list