[Users] Linux kernel: Crash of IB peer in RC mode is not detected
Roland Dreier
roland at purestorage.com
Thu Oct 23 11:43:56 PDT 2014
On Thu, Oct 23, 2014 at 6:50 AM, Jack Wang <xjtuwjp at gmail.com> wrote:
>> I expected that RDMA-Write operations will fail if the other crashes.
>> Also I hoped that an event is generated when a host is crashed. The subnet
>> manager should notice it and notify every other device in the network.
>>
>> Are we missing something in our modules?
>> Is there a way to determine that a RC peer crashed without implementing a
>> ping-pong mechanism?
If the remote system crashes then any memory regions, QPs, etc. are
still valid with the remote HCA, and RDMA read/write operations will
continue to succeed. (Unless the system reboots and reinitializes the
adapter or something like that).
There isn't a way to detect a remote crash unless that remote crash
disconnects your QP or otherwise affects the HCA on the crashed
system.
More information about the Users
mailing list