[ofa-general] [PATCHv5 0/10] RDMAoE support
Eli Cohen
eli at dev.mellanox.co.il
Mon Aug 24 05:13:07 PDT 2009
Roland,
what about this series of patches? Would you like me to re-create them
over your xrc branch or would you rather take them before xrc?
On Wed, Aug 19, 2009 at 08:19:35PM +0300, Eli Cohen wrote:
> RDMA over Ethernet (RDMAoE) allows running the IB transport protocol using
> Ethernet frames, enabling the deployment of IB semantics on lossless Ethernet
> fabrics. RDMAoE packets are standard Ethernet frames with an IEEE assigned
> Ethertype, a GRH, unmodified IB transport headers and payload. IB subnet
> management and SA services are not required for RDMAoE operation; Ethernet
> management practices are used instead. RDMAoE encodes IP addresses into its
> GIDs and resolves MAC addresses using the host IP stack. For multicast GIDs,
> standard IP to MAC mappings apply.
>
> To support RDMAoE, a new transport protocol was added to the IB core. An RDMA
> device can have ports with different transports, which are identified by a port
> transport attribute. The RDMA Verbs API is syntactically unmodified. When
> referring to RDMAoE ports, Address handles are required to contain GIDs while
> LID fields are ignored. The Ethernet L2 information is subsequently obtained by
> the vendor-specific driver (both in kernel- and user-space) while modifying QPs
> to RTR and creating address handles. As there is no SA in RDMAoE, the CMA code
> is modified to fill the necessary path record attributes locally before sending
> CM packets. Similarly, the CMA provides to the user the required address handle
> attributes when processing SIDR requests and joining multicast groups.
>
> In this patch set, an RDMAoE port is currently assigned a single GID, encoding
> the IPv6 link-local address of the corresponding netdev; the CMA RDMAoE code
> temporarily uses IPv6 link-local addresses as GIDs instead of the IP address
> provided by the user, thereby supporting any IP address.
>
> To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and mlx4_ib
> drivers must be loaded, and the netdevice for the corresponding RDMAoE port
> must be running. Individual ports of a multi port HCA can be independently
> configured as Ethernet (with support for RDMAoE) or IB, as is already the case.
> We have successfully tested MPI, SDP, RDS, and native Verbs applications over
> RDMAoE.
>
> Following is a series of 10 patches based on version 2.6.30 of the Linux
> kernel. This new series reflects changes based on feedback from the community
> on the previous set of patches, and is tagged v5.
>
> Changes from v4:
> 1. Added rdma_is_transport_supported() and used it to simplify conditionals
> throughout the code.
> 2. ib_register_mad_agent()for QP0 is only called for IB ports 3. PATCH 5/10
> changed from "Enable support for RDMAoE ports" to "Enable support only for IB
> ports".
> 4. MAD services from userspace currently not supported for RDMAoE ports.
> 5. Add kref to struct cma_multicast to aid in maintaining reference count on
> the object. This is to avoid freeing the object while the worker thread is
> still using it.
> 6. Return immediate error for invalid MTU when resolving an RDMAoE path 7.
> Don't fail resolve path if rate is 0 since this value stands for
> IB_RATE_PORT_CURRENT.
> 8. In cma_rdmaoe_join_multicast(), fail immediately if mtu is zero.
> 9. Add ucma_copy_rdmaoe_route()instead of modifying ucma_copy_ib_route().
> 10. Bug fix: in PATCH 10/10, call flush_workqueue after unregistering netdev
> notifiers
> 11. Multicast no longer use the broadcast MAC.
> 12. No changes to patches 2, 7 and 8 from the v4 series.
>
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>
> ---
>
> b/drivers/infiniband/core/agent.c | 38 ++-
> b/drivers/infiniband/core/cm.c | 25 +-
> b/drivers/infiniband/core/cma.c | 54 ++--
> b/drivers/infiniband/core/mad.c | 41 ++-
> b/drivers/infiniband/core/multicast.c | 4
> b/drivers/infiniband/core/sa_query.c | 39 ++-
> b/drivers/infiniband/core/ucm.c | 8
> b/drivers/infiniband/core/ucma.c | 2
> b/drivers/infiniband/core/ud_header.c | 111 ++++++++++
> b/drivers/infiniband/core/user_mad.c | 6
> b/drivers/infiniband/core/uverbs.h | 1
> b/drivers/infiniband/core/uverbs_cmd.c | 32 ++
> b/drivers/infiniband/core/uverbs_main.c | 1
> b/drivers/infiniband/core/verbs.c | 25 ++
> b/drivers/infiniband/hw/mlx4/ah.c | 187 +++++++++++++---
> b/drivers/infiniband/hw/mlx4/mad.c | 32 +-
> b/drivers/infiniband/hw/mlx4/main.c | 309 +++++++++++++++++++++++++---
> b/drivers/infiniband/hw/mlx4/mlx4_ib.h | 19 +
> b/drivers/infiniband/hw/mlx4/qp.c | 172 ++++++++++-----
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c | 12 -
> b/drivers/net/mlx4/en_main.c | 15 +
> b/drivers/net/mlx4/en_port.c | 4
> b/drivers/net/mlx4/en_port.h | 3
> b/drivers/net/mlx4/fw.c | 3
> b/drivers/net/mlx4/intf.c | 20 +
> b/drivers/net/mlx4/main.c | 6
> b/drivers/net/mlx4/mlx4.h | 1
> b/include/linux/mlx4/cmd.h | 1
> b/include/linux/mlx4/device.h | 31 ++
> b/include/linux/mlx4/driver.h | 16 +
> b/include/linux/mlx4/qp.h | 8
> b/include/rdma/ib_addr.h | 92 ++++++++
> b/include/rdma/ib_pack.h | 26 ++
> b/include/rdma/ib_user_verbs.h | 21 +
> b/include/rdma/ib_verbs.h | 11
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3
> b/net/sunrpc/xprtrdma/svc_rdma_transport.c | 2
> drivers/infiniband/core/cm.c | 5
> drivers/infiniband/core/cma.c | 207 ++++++++++++++++++
> drivers/infiniband/core/mad.c | 37 ++-
> drivers/infiniband/core/ucm.c | 12 -
> drivers/infiniband/core/ucma.c | 31 ++
> drivers/infiniband/core/user_mad.c | 15 -
> drivers/infiniband/core/verbs.c | 10
> include/rdma/ib_verbs.h | 15 +
> 45 files changed, 1440 insertions(+), 273 deletions(-)
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list