[libfabric-users] questions on handling CQ overrun

Hefty, Sean sean.hefty at intel.com
Tue Sep 3 08:23:35 PDT 2019

> The only method I see for preventing CQ overrun at a target is for the initiator to
> make sure that it never has more messages in flight to the target than the depth of the
> target’s CQ.  That works fine if the target has a separate endpoint for each initiator,
> but that seems like a scaling problem in very large configurations like the one above.
> Independent clients don’t know when other clients might be communicating with a
> particular server at the same time.   While each client might limit the number of
> outstanding messages to a server, the combination of many simultaneously communicating
> clients can still easily overrun the server.
> How do producers handle CQ overrun?   In the case of GNI, I see that it simply discards
> arriving CQEs if its CQ happens to be full.  Do other producers handle this
> differently?

CQ overrun is provider and device specific.  SW based CQs can grow dynamically, but HW CQs will lose events and are fatal.  OFI defines FI_RM_ENABLED (resource management enabled) to push this problem down into the provider.  But as a general rule, the following mechanisms usually work.

First there needs to be some sort of end-to-end flow control between EPs.

Size the Tx and Rx CQ >= the total size of the Tx/Rx endpoints using that CQ.  Multiple CQs may be needed.

Use Tx + Tx CQ counters to prevent initiator overrun.  Only post if space is available in both the Tx queue and CQ associated with it.

Do not post more Rx buffers than the size of the Rx CQ.  Repost Rx buffers only after reading an entry from the Rx CQ.

> Are there other methods available for preventing CQ overrun at a target besides
> requiring each target to have a separate endpoint for each initiator?   Or are there
> efficient, low-latency methods for recovering from CQ overrun in a configuration like
> the above?

I'm not aware of any simple mechanisms for recovering from CQ overrun.  It usually requires heavy tracking of requests at the initiators in order to replay them, and even then that may not work, particularly when atomic operations are involved.

- Sean

More information about the Libfabric-users mailing list