[libfabric-users] Verbs send message slow down

Hefty, Sean sean.hefty at intel.com
Tue Mar 21 09:50:22 PDT 2017


> It seems when the number of nodes increases, some of the QPs become
> slow randomly. I noticed this with the CQs for transmitting. It seems
> the CQ's doesn't give the completion events in an adequate time. Some
> of them basically take the abnormally long time to complete. The
> application uses flow control etc, and it seems those aspects are fine.

How many nodes does it take before you see the slow down?  Eventually the number of active connections will swamp the caching capabilities on the NIC/HCA, which will result in QP states being swapping to/from the card from memory.  You can also see slowdowns if receive side buffers are not being re-posted quickly enough.

Other than those guesses, we'd need more information about the setup/application to know if this is something in the verbs provider or underlying hardware/software.

- Sean 


More information about the Libfabric-users mailing list