[libfabric-users] Max IOV length with verbs provider

Carlo Alberto Gottardo carlo.gottardo at cern.ch
Tue Jun 29 08:23:01 PDT 2021


Dear User community, Sean,

I would like to close this thread reporting my solution for dealing with long IOV and the verbs provider.

Libfabric sets IOV length limit of the verbs provider to 4 by default.
This limit be changed via the environment variables

FI_VERBS_TX_IOV_LIMIT
FI_VERBS_RX_IOV_LIMIT

These values are passed to the verbs provider that compares them to the maximum values allowed by the device.
In fact, the actual limitation comes from the hardware and, in the case of Mellanox Connect-X NICs it is can be read with
ibv_devinfo -v under the max_sg field. For the Connect-X5 max_sg is 30, for Connect-X4 it is 32.

In short, setting the env variables and respecting these limits in my application I can reliably transfer IOVs with up to 30 items.

Kind regards,
Carlo


PS:  long IOVs seem not to be a good idea [1], but in my specific use case there's little choice.
[1] https://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/
_____________________
Carlo A. Gottardo
Postdoc at Nikhef
Skype: carlogottardo




On 21 Jun 2021, at 18:14, Hefty, Sean <sean.hefty at intel.com<mailto:sean.hefty at intel.com>> wrote:

using the verbs provider (libfabric v1.12.1), I use the fi_sendmsg to send messages.
The flag FI_INJECT_COMPLETE is used and message I/O vector has a variable length
(iov_count).

Independently from the total size of the message, if iov_count > 20 the messages are
not sent and EAGAIN is returned.
This means that I can send, for example, 24 kB messages with 3 IOV entries, but not
with 24 entries.

To be precise, EAGAIN is returned by ibv_post_send [1] and iov_count is the only
parameter that changes in the test.

I understand that multiple causes can underlie an EAGAIN, but given the role of
iov_count is there something I can check to start with?

Check fi_tx_attr/fi_rx_attr::iov_limit.  That's the max iov size that the provider can support.  It's possible if you run a debug version of the code, you might hit an assertion.

- Sean

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20210629/552a0ab1/attachment.htm>


More information about the Libfabric-users mailing list