[libfabric-users] Handling disconnects in verbs; ofi_rxm with pending fi_recv
Carsten Patzke
carsten.patzke at desy.de
Tue Mar 24 15:45:09 PDT 2020
Hello list,
I am currently using verbs;ofi_rxm to create a kind of REST API like behavior.
The client can add a server to his address vector and initially sends a handshake
requests to the server so that the server is able to store the host in his av. (Because FI_SOURCE_ERR is not supported)
Now the problem arises when the client sends a request to the server,
but the server somehow crashes and will never send a response back to the client.
The client still has a pending fi_recv running and it is not getting any completion data.
When enabling debug output at the client I can see that underlying provider is aware of the disconnect.
Is there some way to know that the server has disconnected or a way to timeout this?
Otherwise I would need to put a huge effort in creating a scheduler which eventually calls fi_cancel when a certain time has passed.
Stay healthy && best regards,
Carsten Patzke
More information about the Libfabric-users
mailing list