[libfabric-users] Handling disconnects in verbs; ofi_rxm with pending fi_recv

Hefty, Sean sean.hefty at intel.com
Tue Mar 24 15:59:34 PDT 2020


> When enabling debug output at the client I can see that underlying provider is aware of
> the disconnect.
> 
> Is there some way to know that the server has disconnected or a way to timeout this?
> Otherwise I would need to put a huge effort in creating a scheduler which eventually
> calls fi_cancel when a certain time has passed.

There is currently no standard way defined for this.  In the past, we discussed having the provider report such errors to the EQ.  It shouldn't be hard to define that in the man pages, but it would still need to be implemented.  Such errors would be reported as best effort, though, with no guarantee that events would be written in all cases (e.g. remote system crashes, with no notifications sent).

- Sean


More information about the Libfabric-users mailing list