[libfabric-users] no perf effect from FI_MORE with verbs; ofi_rxm provider?

Tue Jan 28 15:22:02 PST 2020

+ofiwg

> I'm trying out FI_MORE with the verbs;ofi_rxm provider, but I'm not seeing any
> performance benefit from it.  Specifically, I see the same performance doing remote
> writes with regular fi_write() calls and waiting for the CQ event after each one,
> versus doing batches of 64 fi_writemsg() calls with FI_MORE on all but the last one,
> and waiting for all 64 completions only after the last one in the batch has been
> initiated.  I see the same for remote reads, comparing fi_read() and fi_readmsg().
> Should I expect any performance benefit from FI_MORE with verbs;ofi_rxm?  If so, are
> there other things I have to do to get that benefit, beyond merely batching
> transactions and delaying waiting for completions?

The verbs provider ignores the FI_MORE flag.  With FI_MORE, you would lose overlapping the transfer of the first write with the posting of the next, with a potential gain of limiting PCI transactions.  The GNI and EFA providers support FI_MORE, and might have details of the benefits they see.

I would expect to see a benefit batching writes, and waiting for completions at the end, versus delaying each write until the previous one completed.  If you're not seeing this, can you try decreasing the size of the writes and see if the results are unchanged?

- Sean