[ofa-general] RE: [PATCH 0/8 v3] RDMAoE support

Liran Liss liranl at mellanox.co.il
Wed Jul 15 08:39:07 PDT 2009


See below.
--Liran


Liran Liss wrote:
> GRH is required for all packets; it keeps things simple and provides 
> length information

yes, makes sense

> There is no reason to restrict RDMAoE to RDMACM apps: all existing IB 
> applications can work over RDMAoE

Considering the fact that you've put away IPoIB and the non-existence of
user space SA queries, the only remaining potential user is SRP which
anyway uses sophisticated QP1 and SA services even before going to H.A
and multi-pathing - so I don't see anyone using SRP with RDMAoE. As for
MPIs which don't use rdma-cm, the reason for that has been the problem
or perception of SA scalability which doesn't exist over Ethernet.

So all-in-all (or maybe all-to-all...) the only remaining consumers are
those MPIs which will insist on exchanging <GID, MAC> pairs the way they
exchange  <GID, LID> pairs today (note that with IB GRH isn't a must so
you should now go and patch all MPIs which don't exchange GIDs). These
guys will simply create Adress handles with the <GID, MAC> pairs, and as
such there's no need to have any layer below the rdma-cm and addr doing
MAC or GID resolution.

LL: I think that not having a user-space SA API is a big hole - we
should not accept this situation and work to fill in the gap.
Once such an API exists, all non-rdmacm apps will probably use it -
implementing path queries and multicast joins is a tedious effort, as
the convergence in the kernel has shown.
For this reason, I believe that eventually almost all MAD traffic (at
least for conventional apps - not IB network tools which are irrelevant
for RDMAoE anyhow) will go only through QP0 and QP1.
So, we will add the same SA support for RDMAoE in user-space, and nearly
all apps can make use of it.

> we already have a great CM protocol that runs over QP1 and is a 
> perfect fit for RDMAoE; we don't see any reason to change this

sounds good, lets see what  the other reviewers think, so you didn't
change a bit in the CM?
LL: that's the beauty of it: not a single change!

> The rmdaoa "sa" code does more than just path resolution - it provides
SL, MTU, rate, packet lifetime, and other params. The same thing is also
true for multicast joins.
mmm, interesting, what is the criteria / logic to assign SL and rate? 
looking in rdmaoe_sa.c I didn't find any sign to such code, please point
me to the function/s that implement this. This is before going to the
way you've implemented the multicast joins, which I'll get to later.

LL: this is still TBD. We will need to define reasonable policies to
integrate with DCB features, such as PFC.
Any ideas on this are more than welcome!

> In terms of the code, it is very simple: - move useful definitions 
> that were previously in multicast.c/sa_query.c to header files. - 
> implement only what is needed for rdmaoe in rdmaoe_sa.c. Perhaps we 
> can make better code reuse (calling the SA's cmp_rec(), for example). 
> We will look into it; thanks! I would just like to note that there is 
> a tradeoff here
what ever, however, going your way (i.e claim that there's a need to
rdmaoe_sa.c, I still don't see why rdmaoe_sa.c is needed) I don't see
how it can be reviewed this since you made all the above changes in one
patch - if you want review, separate to at least two patches, the first
moves a way code, the second add the rdmaoe_sa.c.

LL: good idea, thanks.
Regarding rmdaoe_sa.c, we are open to suggestions from the community.
We can surely do without it: RDMAoE support will amount to a few 'if'
statements in the existing SA code.
We understood that most people want to see this separated until the
standard is finalized. In the meantime, we will try to reduce code
duplication as much as possible.


BTW - same goes for the API changes, you should have one patch that
relates to kernel only changes, and another patch which relates to the
user space changes, and this patch should come in a series with changes
to libibverbs so people can review both sides of the kernel/user
channel.

LL: point noted, thanks.

> Yes, we intend to use standard IPv4-mapped addresses 
> (::0xffff<a.b.c.d>)
okay, any reason not to do this from day one? is there anyone in the IB
stack or in your firmware that assumes a specific GID would always be in
index 0?
LL: not at all. Its just a matter of what comes first - implementation
is ongoing.

> on the contrary, there is *nothing* in the architecture that is
HW-specific. This is intentional --- SW rdmaoe will be straighforward,
efficient (as is SW FCoE), and simple (this is rdma transport after
all!) to implement over any nic. We already know of people that are
looking into that.
I'm not convinced... e.g we'll get to this more when doing a detailed
review on the multicast thing. Also, I haven't seen any comment from
anyone on your API changes, can be nice if you act to make the sw rdmaoe
developers comment here.

LL: the only API changes are the sa-like path-queries and multicast
joins; verbs are not changed in any way.
In addition, we implemented the 'get_mac' method (for translating gids
to macs) in the generic user-kernel ABI so any RDMAoE implementation can
use it.
I don't understand what concerns you; please explain.

Or.




More information about the general mailing list