[libfabric-users] Verbs send message slow down
skamburugamuve at gmail.com
Tue Mar 21 10:38:58 PDT 2017
Thanks Sean for your quick response.
I found a workaround to the problem. Previously I was doing cq_read for
multiple completions. If I do cq_read for a single completion the problem
goes away. Is this the expected behavior or something wrong with my program
or a bug in libfabric?
The problem was on the receiver side. Once I start doing single cq_reads on
the receive side the transmissions started to complete.
The slowdown was pretty significant in my case.
On Tue, Mar 21, 2017 at 12:50 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
> > It seems when the number of nodes increases, some of the QPs become
> > slow randomly. I noticed this with the CQs for transmitting. It seems
> > the CQ's doesn't give the completion events in an adequate time. Some
> > of them basically take the abnormally long time to complete. The
> > application uses flow control etc, and it seems those aspects are fine.
> How many nodes does it take before you see the slow down? Eventually the
> number of active connections will swamp the caching capabilities on the
> NIC/HCA, which will result in QP states being swapping to/from the card
> from memory. You can also see slowdowns if receive side buffers are not
> being re-posted quickly enough.
> Other than those guesses, we'd need more information about the
> setup/application to know if this is something in the verbs provider or
> underlying hardware/software.
> - Sean
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Libfabric-users