[openib-general] Re: RDMA Generic Connection Management
Michael Krause
krause at cup.hp.com
Wed Aug 31 11:57:28 PDT 2005
At 07:46 AM 8/31/2005, James Lentini wrote:
>On Tue, 30 Aug 2005, Roland Dreier wrote:
>
> > I just committed this SRP fix, which should make sure we don't use a
> > device after it's gone. And it actually simplifies the code a teeny bit...
>
>The device could still be used after it's gone. For example:
>
> - the user is configuring SRP via sysfs. The thread in
> srp_create_target() has just called ib_sa_path_rec_get()
> [srp.c line 1209] and is waiting for the path
> record query to complete in wait_for_completion()
> - the SA callback, srp_path_rec_completion(), is called. This
> callback thread will make several verb calls (ib_create_cq,
> ib_req_notify_cq, ib_create_qp, ...) without any coordination with
> the hotplug device removal callback, srp_remove_one
>
>Notice that if the SA client's hotplug removal function,
>ib_sa_remove_one(), ensured that all callbacks had completed before
>returning the problem would be fixed. This would protect all ULPs from
>having to deal with hotplug races in their SA callback function. The
>fix belongs in the SA client (the core stack), not in SRP.
>
>All the ULPs are deficient with respect to their hotplug
>synchronization. Given that there is a common problem, doesn't it make
>sense to try and solve it in a generic way instead of in each ULP?
There are two approaches to device removal to consider - both are required
to have a credible solution:
(1) Inform all entities that a planned device removal is to occur and allow
them to close gracefully or migrate to alternatives. Ideally, the OS
comprehends whether the removal will result in the loss of any critical
resources and not inform or take action unless it knows the removal is
something that the system can survive. Doing this requires the ULP to
register interest with the OS in a particular hardware resource. This also
allows the OS to construct a resource analysis tool to determine whether
the removal of a device will be a good idea or not. This is really outside
the scope of an RDMA infrastructure and should be done by the OS through an
OS defined API which is applicable to all types of hardware resources and
sub-systems.
(2) Design all ULP to handle surprise removal, e.g. device failure, from
the start and allow them to close gracefully or migrate to
alternatives. The OS would inform the device driver of the failure if the
device driver has not already discovered the problem. The OS would also
inform interested parties of the device failure. The device driver would
simply error out all users of the device instance - there are already error
codes defined for IB and iWARP for this purpose. The associated verbs
resources should be released as the ULP closes out its resources through
the verbs API (we did define the verbs to clean up resources that the
infrastructure may allocate on behalf of the ULP). Activities such as
listen entries would be released just like what is done for Sockets, etc.
today.
Device addition is simply a matter of informing policy or whatever service
management within the OS that determines what services should be available
on a given device. The device driver really does not need to do anything
special. One area to consider is whether a planned migration of a service
needs to be supported. This is generally best handled by the ULP with only
a small set of services required of the infrastructure, e.g. get / set of
QP / LLP context and then coordinating any other aspects with the
appropriate SM or network services such updating address vectors or fabric
management / configuration.
In general, the ULP should already be designed to handle the error
condition and whether they support a managed / planned removal or migration
is perhaps the only potential area of deficiency.
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050831/b9fdeafb/attachment.html>
More information about the general
mailing list