[ofiwg] DS/DA discussion

Atchley, Scott atchleyes at ornl.gov
Tue Oct 6 10:18:17 PDT 2015

Hi all,

Not wanting to lose too much context, here are my thoughts. Please feel free to shoot down/modify/discard as desired.

As I mentioned on the call, I think there are two orthogonal issues:

1. OFI in the kernel

2. Additions to OFI to address storage concerns

Item 1 is to allow vendors to implement storage protocols that currently use Verbs (iSer, SRP, LND, etc) as opposed to sockets (iSCSI, AOE), without requiring the vendor to adopt the InfiniBand wire protocol. This allows innovation at the interconnect level while maintaining a common interface. This is exactly the same argument for OFI in userspace, but applied to the kernel. There is _not_ a requirement that both ends of the communication be in the kernel to allow support for distributed storage systems with userspace daemons.

Item 1 is the arguments for the Linux kernel developers adopt OFI in the kernel.

Item 2 addresses any possible shortcomings in OFI that arise from explicitly managing the memory hierarchy using local and remote resources, especially devices with persistence. The driving use cases for the OFI definition are process-to-process communication via message queues, tag matching, remote memory access (i.e. one-sided), atomics, etc. In the DS/DA case, the peer may be an active process or simply a passive device. This mode needs a subset of the OFI interface, namely RMA and atomics as well as CM services to establish communication.

One vendor may want to support NVM on DIMMs that provide byte-addressable access and are physically accessed via the memory sub-system. Another may want to support NVMe via byte-addressable or block access physically accessed by PCIe or NVMe-over-fabrics.

Of the possible devices, what do they need that OFI does not yet have? Flags or operations to indicate that a memory should persisted (I think Intel gave an example of a new instruction to move data into a “persistence domain”)? Does it lack a “commit” or “sync” operation to make the remote device perform a storage-specific operation? Something else?

I think that these additions would be useful in both userspace and in the kernel.


More information about the ofiwg mailing list