[Users] Linux kernel: Crash of IB peer in RC mode is not detected

Sagi Grimberg sagig at dev.mellanox.co.il
Fri Oct 24 04:23:42 PDT 2014


> Thanks Roland to clarify our confusion.
> 
> So looks ping-pong mechanism is the way to go.
> 

Not sure if it will work for your solution, but you can also register to SM traps.

> Regards,
> Jack
> 
> 2014-10-23 20:43 GMT+02:00 Roland Dreier <roland at purestorage.com>:
>> On Thu, Oct 23, 2014 at 6:50 AM, Jack Wang <xjtuwjp at gmail.com> wrote:
>>>> I expected that RDMA-Write operations will fail if the other crashes.
>>>> Also I hoped that an event is generated when a host is crashed. The subnet
>>>> manager should notice it and notify every other device in the network.
>>>> 
>>>> Are we missing something in our modules?
>>>> Is there a way to determine that a RC peer crashed without implementing a
>>>> ping-pong mechanism?
>> 
>> If the remote system crashes then any memory regions, QPs, etc. are
>> still valid with the remote HCA, and RDMA read/write operations will
>> continue to succeed.  (Unless the system reboots and reinitializes the
>> adapter or something like that).
>> 
>> There isn't a way to detect a remote crash unless that remote crash
>> disconnects your QP or otherwise affects the HCA on the crashed
>> system.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20141024/86356252/attachment.html>


More information about the Users mailing list