[openib-general] Re: RDMA Generic Connection Management
Roland Dreier
rolandd at cisco.com
Wed Aug 31 09:17:17 PDT 2005
James> The device could still be used after it's gone. For
James> example:
James> - the user is configuring SRP via sysfs. The thread in
James> srp_create_target() has just called ib_sa_path_rec_get()
James> [srp.c line 1209] and is waiting for the path record query
James> to complete in wait_for_completion() - the SA callback,
James> srp_path_rec_completion(), is called. This callback thread
James> will make several verb calls (ib_create_cq,
James> ib_req_notify_cq, ib_create_qp, ...) without any
James> coordination with the hotplug device removal callback,
James> srp_remove_one
I don't think this can happen. How could srp_remove_one get past
wait_for_completion(&host->released);
if the sysfs file is still in use?
James> Notice that if the SA client's hotplug removal function,
James> ib_sa_remove_one(), ensured that all callbacks had
James> completed before returning the problem would be fixed. This
James> would protect all ULPs from having to deal with hotplug
James> races in their SA callback function. The fix belongs in the
James> SA client (the core stack), not in SRP.
All SA client callbacks are driven by the MAD layer. And
ib_sa_remove_one() does ib_unregister_mad_agent(), which should wait
for all callbacks to finish. So I think we already do the best we can
here. Unfortunately the SA client code must clean up after all the
ULPs that depend on it, because ULPs can use the SA up until they know
the device is gone. But I don't see a way around that.
James> All the ULPs are deficient with respect to their hotplug
James> synchronization. Given that there is a common problem,
James> doesn't it make sense to try and solve it in a generic way
James> instead of in each ULP?
Yes, but what is the generic way?
- R.
More information about the general
mailing list