[libfabric-users] client fi_eq_sread busy (eternally) after server accepted request

Hefty, Sean sean.hefty at intel.com
Wed Jun 10 08:50:31 PDT 2020


Are you able to install a newer version of libfabric?

> like the Title suggests a fi_eq_sread returns FI_EBUSY and continues to block, although
> there should be an Event Message in my understanding, being FI_CONNECTED.

Can you clarify what you mean by the call returning EBUSY while also blocking?  What call is returning, and what call is blocked?

> The eq is fresh, it is the first use, except building and binding it to ep, not
> wait_set defined, but wait_obj is set to FI_WAIT_MUTEX_COND. Event Buffer and Event
> field are freshly defined as well. No flags are set, timeout is -1 (I know that means
> infinite waiting until event occours, but to my knowledge an Event should have been
> occoured).

Correct.  A timeout of -1 will block until an event occurs.

> Client successfully calls fi_connect and then goes into fi_eq_sread.

Is the client blocked?

> Server gets the request, does his part of establishing required ressources for the
> connection (ep, domain, etc) and calls fi_accept successfully afterwards.

What does the server do after fi_accept?

> Used libfabric Version is 1.5.3, due to the target systems running on 18.04 Ubuntu.
> 
> 
> Since it is in developement, the Program uses socket Provider at the moment.
> 
> 
> If the eq is ignored the program runs successful locally, but this obviously leads to
> potential errors being ignored, thus not desireable.
> 
> 
> 
> 
> Any ideas, where my Problem lies and what I did wrong?
> 
> If any additional info is needed, I am happy to provide.

I need a better idea of what the application is trying to do and where the hang is happening.

- Sean


More information about the Libfabric-users mailing list