[openib-general] [SRP] [RFC] Needed changes to support fail-over drivers

Vu Pham vuhuong at mellanox.com
Wed Jul 26 15:51:46 PDT 2006


Roland Dreier wrote:
>  > > Why does userspace need to be able to disconnect a connection?
> 
>  > There are two options on who will initiate the disconnection: the userspace
>  > daemon or the ib_srp module.  I considered both options and I was not sure
>  > which one is better.  I choose to do it in userspace because it looks a good
>  > symmetry that both the disconnection and reconnection will be initiate in the
>  > same place.  I will accept your comment and change it to the kernel.
> 
> I'm not telling you what to do -- I'm just asking.
> 
> But it does seem to me that the kernel knows better when to disconnect
> a connection -- eg I don't think an error completion will be signaled
> to userspace.  Conversely if a target goes away and comes back with no
> IOs submitted in between, then the connection should survive and
> there's no reason to disconnect/reconnect.
>

Yes; however the usermode can still signal the kernel about the events 
but the kernel will justify on the action to disconnect/reconnect. In 
your example with no I/O, the kernel can check active_q/pending_q and 
decide to keep the connection intact.

While the target is offline + some apps issue I/Os or in case of error 
completion/IB errors, the kernel can actively disconnect a connection, 
moving target to DISCONNECTED state if required.

And it does seem to me that the kerne does not know a target off-line 
until scsi commands timeout and scsi error recovery kick in - this will 
bring scsi devices to off-line state. Some fail-over drivers may not 
happy about scsi devices going off-line. So the kernel can rely on 
usermode's signal to disconnect.

In summary I think that we need usermode + kernel working together. 
Usermode signal the kernel about off-line/on-line events, kernel justify 
on action disconnect/reconnect or not






More information about the general mailing list