[openib-general] [SRP] [RFC] Needed changes to support fail-over drivers
Vu Pham
vuhuong at mellanox.com
Wed Jul 26 15:51:46 PDT 2006
Roland Dreier wrote:
> > > Why does userspace need to be able to disconnect a connection?
>
> > There are two options on who will initiate the disconnection: the userspace
> > daemon or the ib_srp module. I considered both options and I was not sure
> > which one is better. I choose to do it in userspace because it looks a good
> > symmetry that both the disconnection and reconnection will be initiate in the
> > same place. I will accept your comment and change it to the kernel.
>
> I'm not telling you what to do -- I'm just asking.
>
> But it does seem to me that the kernel knows better when to disconnect
> a connection -- eg I don't think an error completion will be signaled
> to userspace. Conversely if a target goes away and comes back with no
> IOs submitted in between, then the connection should survive and
> there's no reason to disconnect/reconnect.
>
Yes; however the usermode can still signal the kernel about the events
but the kernel will justify on the action to disconnect/reconnect. In
your example with no I/O, the kernel can check active_q/pending_q and
decide to keep the connection intact.
While the target is offline + some apps issue I/Os or in case of error
completion/IB errors, the kernel can actively disconnect a connection,
moving target to DISCONNECTED state if required.
And it does seem to me that the kerne does not know a target off-line
until scsi commands timeout and scsi error recovery kick in - this will
bring scsi devices to off-line state. Some fail-over drivers may not
happy about scsi devices going off-line. So the kernel can rely on
usermode's signal to disconnect.
In summary I think that we need usermode + kernel working together.
Usermode signal the kernel about off-line/on-line events, kernel justify
on action disconnect/reconnect or not
More information about the general
mailing list