[Ofvwg] OFVWG meeting notes - 12/22/2015

Liran Liss liranl at mellanox.com
Sun Jan 3 07:48:23 PST 2016


Sagi Grimberg presented an RFC for supporting RDMA erasure coding offloads.

RAID schemes are common in networked storage solutions, such as Ceph and Gluster.
Erasure coding is a generalization of different RAID schemes, such as mirroring, parity block, and dual-parity blocks.
Specifically, using erasure coding, a system can withstand multiple disk failures by adding redundancy code blocks to the original data.

In a distributed setting, different data and redundancy blocks typically reside on different nodes to minimize correlated failures.
Thus, the task of calculation of the code blocks, which is highly cpu intensive, is related to data transfer.

The proposed API offers support for the following use cases:
(1) Synchronous encoding (for generating code blocks)
(2) Synchronous decoding (for reconstructing data following a disk loss)
(3) Asynchronous encoding
(4) Asynchronous decoding
(5) Encode + data transfer

The last use-case is particularly interesting because it enables the Verbs provider to encode the data and transfer the outcome to remote peers without any SW synchronization.

The API consists of an erasure coding (EC) context, which is initialized with the code parameters and calculation matrix.
This context may be used for multiple encoding/decoding operations.
When operations are invoked, the caller providers a memory layout specifying the scatter/gather entries that constitute the data and code blocks, as well as the block size, for the specific operations.
Asynchronous operations are also given a callback function to indicate completion.

Finally, the encode+transfer primitive also requires a vector of strip objects, one for each data or code block.
The stripe objects specify the QP and RDMA operation to conduct following encoding completion.
Multiple operations may be posted concurrently on the same EC context, allowing, for example, the pipelining of multiple disk writes.

In terms of ordering, all operations submitted to the same EC context are executed in submission order.
However, in encode+transfer operations, the ordering between transfers to different peers is not guaranteed.
--Liran

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofvwg/attachments/20160103/3577d1ed/attachment.html>


More information about the ofvwg mailing list