[libfabric-users] verbs WAIT_FD signalling at all times

Carlo Alberto Gottardo carlo.gottardo at cern.ch
Wed Jun 30 13:48:30 PDT 2021

I apologise for the spam. I have to make a little rectification to my previous email...

fi_trywait is not blocking anything.
My problem is, that after one or a few messages sent, fi_trywait doesn't succeed any longer
and it's called over and over again.

 fi_trywait doesn't return FI_SUCCEED because !slist_empty(&cq->saved_wc_list) at


because in the previous function call vrb_save_wc(cq, &wc) was called


Carlo A. Gottardo
Postdoc at Nikhef

On 30 Jun 2021, at 17:31, Carlo Alberto Gottardo <carlo.gottardo at cern.ch<mailto:carlo.gottardo at cern.ch>> wrote:

Dear Sean,

thank you for your previous message.

I followed your advice and I created a structure that keeps track of all the completion queues and their fid.
Then I wrapped epoll_wait in fi_trywait

if( fi_trywait ( evloop->pfids.fabric, evloop->pfids.fid_set, evloop->pfids.count ) == FI_SUCCESS){

nevents = epoll_wait(evloop->epollfd, evloop->events, MAX_EPOLL_EVENTS, EPOLL_TIMEOUT);
for(int i=0; i<nevents; i++){ process_event(...

however fi_trywait calls, via vrb_cq_trywait, ibv_get_cq_event which seems to be blocking (i.e. killing the program with GDB I see that the program is stuck there).
Following [1], I tried to make non-blocking by setting the O_NONBLOCK flag to the FD prior to adding it to epoll but this had no effect.

My application is based on a single event loop (the epoll) running on a single thread and executing non-blocking functions so I wouldn't expect polling to happen elsewhere.

Is there something wrong in what I am doing?

Thank you very much for your help,

[1] https://www.rdmamojo.com/2013/03/09/ibv_get_cq_event/

Carlo A. Gottardo
Postdoc at Nikhef

On 29 Jun 2021, at 18:53, Hefty, Sean <sean.hefty at intel.com<mailto:sean.hefty at intel.com>> wrote:


With verbs devices there are 2 queues in play here.  The first is associated with the fd, where low-level events are in the kernel.  This event is generated in response to an interrupt from the device.  In order to limit how many interrupts the device generates, the device must be manually reset before it will generate another event.  The event indicates which CQ had entries added to it.

The second queue contains the completion entries itself.  That is what fi_cq_read is accessing.

The reason the fd remains signaled is that the kernel event is never being read.

If an application wants to wait directly on a wait object (fd) using OS specific calls (select/poll), it needs to call fi_trywait() prior to blocking.


- Sean

using the verbs provider I use the FI_WAIT_FD wait object for the completion queue
The resulting file descriptor (FD), associated to the libibverbs completion channel, is
added to an epoll on waiting for EPOLLIN.
The FD signal triggers the callback where fi_cq_read reads the completions in a non-
blocking way.

The problem is that, as soon as a connection is established, the FD keeps signalling as
fast as the CPU allows for, even if there is no data transfer.
As a matter of fact, after the first call, fi_cq_read keeps returning EAGAIN, sign
there there is no CQ entry to read.

I would expect the fd not to signal if there's nothing to read in the CQ. Shouldn't be
this the case?

I read some libibverbs documentation and this point still remains unclear to me.

Below I post the CQ attributes and some function calls.

Thank you very much for your help,

Carlo A. Gottardo
Postdoc at Nikhef
Skype: carlogottardo

System: Libfabric 1.12.1 / Centos7 / Mellanox Connect-X5

CQ attributes

struct fi_cq_attr cq_attr;
cq_attr.size = MAX_CQ_ENTRIES;
cq_attr.flags = 0;
cq_attr.format = FI_CQ_FORMAT_DATA;
cq_attr.wait_obj= FI_WAIT_FD;
cq_attr.signaling_vector = 0;
cq_attr.wait_cond = FI_CQ_COND_NONE;
cq_attr.wait_set = NULL;

the queue is open and bonded with

fi_cq_open(rsocket->domain, &cq_attr, &rsocket->cq, NULL)))
fi_ep_bind((rsocket->ep), &rsocket->cq->fid, FI_TRANSMIT|FI_RECV)))

when the connection is established the wait object is retrieved with

fi_control(&socket->cq->fid, FI_GETWAIT, &socket->cqfd)

the file descriptor is assigned a callback
     socket->cq_ev_ctx.fd = socket->cqfd;
     socket->cq_ev_ctx.data = socket;
     socket->cq_ev_ctx.cb = on_recv_socket_cq_event;

and finally the file descriptor is added to the the main and only epoll of the
application, which waits for EPOLLIN.
In the on_recv_socket_cq_event callback:
fi_cq_read(socket->cq, &completion_entries, N);

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20210630/d8070e6a/attachment-0001.htm>

More information about the Libfabric-users mailing list