[libfabric-users] client fi_eq_sread busy (eternally) after server accepted request
sean.hefty at intel.com
Wed Jun 10 08:50:31 PDT 2020
Are you able to install a newer version of libfabric?
> like the Title suggests a fi_eq_sread returns FI_EBUSY and continues to block, although
> there should be an Event Message in my understanding, being FI_CONNECTED.
Can you clarify what you mean by the call returning EBUSY while also blocking? What call is returning, and what call is blocked?
> The eq is fresh, it is the first use, except building and binding it to ep, not
> wait_set defined, but wait_obj is set to FI_WAIT_MUTEX_COND. Event Buffer and Event
> field are freshly defined as well. No flags are set, timeout is -1 (I know that means
> infinite waiting until event occours, but to my knowledge an Event should have been
Correct. A timeout of -1 will block until an event occurs.
> Client successfully calls fi_connect and then goes into fi_eq_sread.
Is the client blocked?
> Server gets the request, does his part of establishing required ressources for the
> connection (ep, domain, etc) and calls fi_accept successfully afterwards.
What does the server do after fi_accept?
> Used libfabric Version is 1.5.3, due to the target systems running on 18.04 Ubuntu.
> Since it is in developement, the Program uses socket Provider at the moment.
> If the eq is ignored the program runs successful locally, but this obviously leads to
> potential errors being ignored, thus not desireable.
> Any ideas, where my Problem lies and what I did wrong?
> If any additional info is needed, I am happy to provide.
I need a better idea of what the application is trying to do and where the hang is happening.
More information about the Libfabric-users