[openib-general] Re: RDMA Generic Connection Management

James Lentini jlentini at netapp.com
Fri Sep 2 14:02:28 PDT 2005



On Wed, 31 Aug 2005, Roland Dreier wrote:

>     James> The device could still be used after it's gone. For
>     James> example:
> 
>     James>  - the user is configuring SRP via sysfs. The thread in
>     James> srp_create_target() has just called ib_sa_path_rec_get()
>     James> [srp.c line 1209] and is waiting for the path record query
>     James> to complete in wait_for_completion() - the SA callback,
>     James> srp_path_rec_completion(), is called. This callback thread
>     James> will make several verb calls (ib_create_cq,
>     James> ib_req_notify_cq, ib_create_qp, ...) without any
>     James> coordination with the hotplug device removal callback,
>     James> srp_remove_one
> 
> I don't think this can happen.  How could srp_remove_one get past
> 
> 		wait_for_completion(&host->released);
> 
> if the sysfs file is still in use?

You're right. srp_remove_one will wait for the sysfs file to close. 

What about SRP's interactions with the SCSI layer? 

When scsi_remove_host() returns are you guaranteed that there are no 
SCSI calls into your code in progress (e.g. in srp_queuecommand)? 

>     James> Notice that if the SA client's hotplug removal function,
>     James> ib_sa_remove_one(), ensured that all callbacks had
>     James> completed before returning the problem would be fixed. This
>     James> would protect all ULPs from having to deal with hotplug
>     James> races in their SA callback function. The fix belongs in the
>     James> SA client (the core stack), not in SRP.
> 
> All SA client callbacks are driven by the MAD layer.  And
> ib_sa_remove_one() does ib_unregister_mad_agent(), which should wait
> for all callbacks to finish.  So I think we already do the best we can
> here.  Unfortunately the SA client code must clean up after all the
> ULPs that depend on it, because ULPs can use the SA up until they know
> the device is gone.  But I don't see a way around that.
> 
>     James> All the ULPs are deficient with respect to their hotplug
>     James> synchronization. Given that there is a common problem,
>     James> doesn't it make sense to try and solve it in a generic way
>     James> instead of in each ULP?
> 
> Yes, but what is the generic way?

The generic way would be to handle this in a common layer. For the IB 
verbs + RDMA connection API to be as easy to use as the sockets API, 
then it needs to make this issue transparent.

Take the current rpc code in net/sunrpc as an example. It uses the 
sock_create_kern(), kernel_sendmsg(), kernel_recvmsg(), etc. without 
ever needing to worry about hotplug events. The layers between it and 
the low level drivers (Ethernet, IBoIP, etc.) take care of that.



More information about the general mailing list