[ofiwg] Feedback about libfabric related to block storage

Rob Latham robl at mcs.anl.gov
Tue Oct 7 07:15:14 PDT 2014

On 10/07/2014 07:44 AM, Bart Van Assche wrote:

> * Several API's in the Linux kernel pass a scatterlist to a block driver
> or SCSI LLD (struct scatterlist). Such a scatterlist consists of one or
> more discontiguous memory regions. As an example, the data associated
> with a READ or WRITE request is passed as a scatterlist to block
> drivers. However, all RDMA memory registration primitives I am familiar
> with support registration of a single contiguous virtual memory region.
> It is not always possible to map a Linux scatterlist onto a single
> contiguous virtual memory region. Some RDMA API's, e.g. the recently
> added API's for T10-PI only accept a single memory key. Similarly, in
> the header of certain protocols, e.g. iSER, only one memory key can be
> stored. Hence the presence of code for copying discontiguous
> scatterlists into a contiguous buffer in block storage initiator drivers
> that use the RDMA API. I see this as a mismatch between the capabilities
> of the Linux kernel and the RDMA API. My proposal is to address this by
> modifying the RDMA API such that registration of discontiguous memory
> regions via a single memory key becomes possible. This will eliminate
> the need for data copying in RDMA block storage drivers.

I/O in the high-performace computing context would also benefit from 
such a non-contiguous i/o approach.  Research in this space has been 
somewhat quiet for the last few years, but PVFS list-io, PVFS v2 
datatype i/o, and the proposed POSIX extension routines readx/writex all 
represent middleware-level interfaces for dealing with noncontig I/O 
that would map well to noncontiguous support in the storage transport layer.


Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

More information about the ofiwg mailing list