[ofiwg] Feedback about libfabric related to block storage

Rob Latham robl at mcs.anl.gov
Tue Oct 7 07:15:14 PDT 2014



On 10/07/2014 07:44 AM, Bart Van Assche wrote:

> * Several API's in the Linux kernel pass a scatterlist to a block driver
> or SCSI LLD (struct scatterlist). Such a scatterlist consists of one or
> more discontiguous memory regions. As an example, the data associated
> with a READ or WRITE request is passed as a scatterlist to block
> drivers. However, all RDMA memory registration primitives I am familiar
> with support registration of a single contiguous virtual memory region.
> It is not always possible to map a Linux scatterlist onto a single
> contiguous virtual memory region. Some RDMA API's, e.g. the recently
> added API's for T10-PI only accept a single memory key. Similarly, in
> the header of certain protocols, e.g. iSER, only one memory key can be
> stored. Hence the presence of code for copying discontiguous
> scatterlists into a contiguous buffer in block storage initiator drivers
> that use the RDMA API. I see this as a mismatch between the capabilities
> of the Linux kernel and the RDMA API. My proposal is to address this by
> modifying the RDMA API such that registration of discontiguous memory
> regions via a single memory key becomes possible. This will eliminate
> the need for data copying in RDMA block storage drivers.

I/O in the high-performace computing context would also benefit from 
such a non-contiguous i/o approach.  Research in this space has been 
somewhat quiet for the last few years, but PVFS list-io, PVFS v2 
datatype i/o, and the proposed POSIX extension routines readx/writex all 
represent middleware-level interfaces for dealing with noncontig I/O 
that would map well to noncontiguous support in the storage transport layer.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



More information about the ofiwg mailing list