[ofiwg] improved sock_stream support

Hefty, Sean sean.hefty at intel.com
Tue May 25 15:20:31 PDT 2021


There are 2 WiP pull requests to improve performance over tcp.  Rather than have in-depth discussions on the PR between a handful of subscribers, I'm moving the discussion to the mail list.  We are looking for comments and input into the general design for both.

The first PR adds support for MSG_ZEROCOPY:

https://github.com/ofiwg/libfabric/pull/6760

The second adds support for io_uring:

https://github.com/ofiwg/libfabric/pull/6761

Based on my research, io_uring comes with documentation suitable for making powerpoint slides that describe how great and wonderful it is.  Unfortunately, it lacks enough documentation for someone to do something mundane, such as writing code to use it.  (Whatever link you were thinking of sending, there's a 99% chance I've already been to that site.)

The zerocopy case is the easier of the two.  The proposed changes use zerocopy for larger transfers.  A zerocopy send can return that only some portion of the transfer was accepted (at least based on sample code).  I admit I don't understand why an async send would return that it only intended to send half the request at some undefined future time, rather than the entire operation.

The (e)poll-related notification flow with zerocopy is essentially unchanged relative to the non-blocking socket flow.  It uses pollin/pollout to know when the socket is ready, with only additional work to read the async completion data.

For io_uring, a poll notification flow isn't the best option.  The next receive should be queued immediately, with checks against a cq to see when it's done.  The receive may complete with less data than the buffer size, I think, maybe, possibly.  This seems desirable, but does limit each socket to 1 outstanding receive to be efficient.

However, send completions may also indicate that a send has partially completed.  If so, then attempting to queue multiple sends is not useful, which makes me question my understanding.

If anyone has experience using io_uring with sockets, feel free to chime in.

- Sean


More information about the ofiwg mailing list