[ofa-general] RE: [PATCH 0/8 v3] RDMAoE support

Liran Liss liranl at mellanox.co.il
Tue Jul 14 13:59:32 PDT 2009


S.B.
Regards,
--Liran 

-----Original Message-----
From: Or Gerlitz [mailto:ogerlitz at voltaire.com] 
Sent: Tuesday, July 14, 2009 12:43 PM
To: Eli Cohen
Cc: Sean Hefty; Roland Dreier; Jack Morgenstein; general-list; Liran
Liss
Subject: Re: [PATCH 0/8 v3] RDMAoE support

Eli Cohen wrote:
> RDMAoE allows running the IB transport protocol using Ethernet frames
allowing the deployment of IB semantics on lossless Ethernet fabrics.
RDMAoE packets are standard Ethernet frames with an IEEE assigned
Ethertype, a GRH, unmodified IB transport headers and payload.
Hi Eli and the team @ Mellanox

Before going into more detailed review and comments, I'd like to try and
clarify few issues.

1. GRH - is it a must per your design to have it also for unicast
packets? or maybe it just simplifies things, or both? I assume you may
need it for the CM logic to keep working the way it used to.
LL: GRH is required for all packets; it keeps things simple and provides
length information.

2. is there any reason not to restrict the design for supporting addr /
rdma-cm based consumers? e.g in the same manner that iWARP is?
LL: There is no reason to restrict RDMAoE to RDMACM apps: all existing
IB applications can work over RDMAoE.

3. CM services - note that once the ULP did the address resolution, the
addr / rdma-cm data structure can/has the src/dest MACs, so basically
why not push all the changes to the mad code and eimplement the MADs
over raw or even datagram socket?
LL: we already have a great CM protocol that runs over QP1 and is a
perfect fit for RDMAoE; we don't see any reason to change this.

4. regarding your rdmaoe_sa - aside from being non review-able! (since
in the same patch 3/8 you move tons of code from one place to another
and add changes) it seems to me an overkill.  I would like to see a
solution which takes advantage of the SA non existence  under Ethernet,
and hence route resolution becomes no-op as it is with iWARP, as for
multicast join, since it is a local operation it should and can be done
by the rdma-cm, no need to modify the layers below it, expect for
exposing API for the rdma-cm to do so.

LL: As I said earlier, we do *not* want to restrict ourselves
artificially to RDMACM.
The rmdaoa "sa" code does more than just path resolution - it provides
SL, MTU, rate, packet lifetime, and other params. The same thing is also
true for multicast joins.
In terms of the code, it is very simple:
- move useful definitions that were previously in multicast.c and
sa_query.c to header files.
- implement only what is needed for rdmaoe in rdmaoe_sa.c.
Perhaps we can make better code reuse (calling the SA's cmp_rec(), for
example). We will look into it; thanks!
I would just like to note that there is a tradeoff here: keeping rdmaoe
code separate for independent development vs. reducing code duplication.
Perhaps, as rmdaoe matures, we can aggregate the code to reduce
duplication.

5. 8/8 says "Currently, each IB port has a single GID entry in its table
and that GID entery equals the link local IPv6 address" - does your
design support IPv4 as well, how?
LL: Yes, we intend to use standard IPv4-mapped addresses
(::0xffff<a.b.c.d>).

> To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and
mlx4_ib drivers must be loaded, and the netdevice for the corresponding
RDMAoE port must be running. 
6. it seems that your design is somehow too tightly coupled to the
connectX hca and the mlx4 driver, please note that the design should
allow for implementing software rdmaoe provider as well

LL: on the contrary, there is *nothing* in the architecture that is
HW-specific.
This is intentional --- SW rdmaoe will be straighforward, efficient (as
is SW FCoE), and simple (this is rdma transport after all!) to implement
over any nic.
We already know of people that are looking into that.

> Individual ports of a multi port HCA can be independently configured 
> as Ethernet (with support for RDMAoE) or IB, as is already the case. 
> [...] Following is a series of 8 patches based on version 2.6.30 of 
> the Linux kernel
>   
Does the mainline kernel has all the patches to do so - I wasn't sure if
this is the case. can you send the instructions?
> This new series reflects changes based on feedback from the community 
> on the previous set of patches. The whole series is tagged v3
There's not a single mentioning of a change vs the previous versions,
how do you expect someone not to treat it as v1?!

LL: thanks for the remark, we will send a list of changes in our
following patches.

For some reason you have chosen to use cross-posting, I don't think
these patches need that.

Or.





More information about the general mailing list