[ewg] [PATCH 0/8 v3] RDMAoE support
Yossi Etigin
yossi.openib at gmail.com
Fri Jul 17 05:50:13 PDT 2009
Eli Cohen wrote:
> RDMA over Ethernet (RDMAoE) allows running the IB transport protocol
> using Ethernet frames allowing the deployment of IB semantics on
> lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames
> with an IEEE assigned Ethertype, a GRH, unmodified IB transport
> headers and payload. Aside from the considerations pointed out below,
> RDMAoE ports are functionally equivalent to regular IB ports from the
> RDMA stack perspective.
>
> IB subnet management and SA services are not required for RDMAoE
> operation; Ethernet management practices are used instead. In
> Ethernet, nodes are commonly referred to by applications by means of
> an IP address. RDMAoE encodes the IP addresses that were assigned to
> the corresponding Ethernet port into its GIDs, and makes use of the IP
> stack to bind a destination address to the corresponding netdevice
> (just as the CMA does today for IB and iWARP) and to obtain its L2 MAC
> addresses.
>
> The RDMA Verbs API is syntactically unmodified. When referring to
> RDMAoE ports, Address handles are required to contain GIDs and the L2
> address fields in the API are ignored. The Ethernet L2 information is
> then obtained by the vendor-specific driver (both in kernel- and
> user-space) while modifying QPs to RTR and creating address handles.
>
> In order to maximize transparency for applications, RDMAoE implements
> a dedicated API that provides services equivalent to some of those
> provided by the IB-SA. The current approach is strictly local but may
> evolve in the future. This API is implemented using an independent
> source code file which allows for seamless evolution of the code
> without affecting the IB native SA interfaces. We have successfully
> tested MPI, SDP, RDS, and native Verbs applications over RDMAoE.
>
> To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and
> mlx4_ib drivers must be loaded, and the netdevice for the
> corresponding RDMAoE port must be running. Individual ports of a multi
> port HCA can be independently configured as Ethernet (with support for
> RDMAoE) or IB, as is already the case.
>
> Following is a series of 8 patches based on version 2.6.30 of the
> Linux kernel. This new series reflects changes based on feedback from
> the community on the previous set of patches. The whole series is
> tagged v3.
>
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>
>
I agree with Or here, I really do not think that making RDMAoE transparent
to applications is worth pushing a lot of compatibility code to the kernel.
The winner here is definitely rdmaoe_sa - 1000 lines of useless code which boils
down to kernel_bind and kernel_setsockopt. Why do you need all this code to
hold state, refcounts, whatever - if the kernel already does this for you?
If an application uses IB - let it use real IB. If it uses RDMA - let it use
all RDMA implementations out there (IB, iwarp, RDMAoE).
Therefore, I think the correct place to add RDMAoE is under rdma_cm.
If a consumer wants to use RDMAoE - it should use rdma_cm. Looks like you are
trying to add something that is between RDMAoE and IBoE, and put a lot of hacky
bypass logic in core and ulps.
--Yossi
More information about the ewg
mailing list