[ofa-general] Re: [PATCH 0/8 v3] RDMAoE support

Or Gerlitz ogerlitz at voltaire.com
Wed Jul 15 01:29:11 PDT 2009


Liran Liss wrote:
> GRH is required for all packets; it keeps things simple and provides length information

yes, makes sense

> There is no reason to restrict RDMAoE to RDMACM apps: all existing IB applications can work over RDMAoE

Considering the fact that you've put away IPoIB and the non-existence of 
user space SA queries, the only remaining potential user is SRP which 
anyway uses sophisticated QP1 and SA services even before going to H.A 
and multi-pathing - so I don't see anyone using SRP with RDMAoE. As for 
MPIs which don't use rdma-cm, the reason for that has been the problem 
or perception of SA scalability which doesn't exist over Ethernet.

So all-in-all (or maybe all-to-all...) the only remaining consumers are 
those MPIs which will insist on exchanging <GID, MAC> pairs the way they 
exchange  <GID, LID> pairs today (note that with IB GRH isn't a must so 
you should now go and patch all MPIs which don't exchange GIDs). These 
guys will simply create Adress handles with the <GID, MAC> pairs, and as 
such there's no need to have any layer below the rdma-cm and addr doing 
MAC or GID resolution.

> we already have a great CM protocol that runs over QP1 and is a perfect fit for RDMAoE; we don't see any reason to change this

sounds good, lets see what  the other reviewers think, so you didn't 
change a bit in the CM?

> The rmdaoa "sa" code does more than just path resolution - it provides SL, MTU, rate, packet lifetime, and other params. The same thing is also true for multicast joins.
mmm, interesting, what is the criteria / logic to assign SL and rate? 
looking in rdmaoe_sa.c I didn't find any sign to such code, please point 
me to the function/s that implement this. This is before going to the 
way you've implemented the multicast joins, which I'll get to later.

> In terms of the code, it is very simple: - move useful definitions that were previously in multicast.c/sa_query.c to header files. - implement only what is needed for rdmaoe in rdmaoe_sa.c. Perhaps we can make better code reuse (calling the SA's cmp_rec(), for example). We will look into it; thanks! I would just like to note that there is a tradeoff here
what ever, however, going your way (i.e claim that there's a need to 
rdmaoe_sa.c, I still don't see why rdmaoe_sa.c is needed) I don't see 
how it can be reviewed this since you made all the above changes in one 
patch - if you want review, separate to at least two patches, the first 
moves a way code, the second add the rdmaoe_sa.c.

BTW - same goes for the API changes, you should have one patch that 
relates to kernel only changes, and another patch which relates to the 
user space changes, and this patch should come in a series with changes 
to libibverbs so people can review both sides of the kernel/user channel.

> Yes, we intend to use standard IPv4-mapped addresses (::0xffff<a.b.c.d>)
okay, any reason not to do this from day one? is there anyone in the IB 
stack or in your firmware that assumes a specific GID would always be in 
index 0?
> on the contrary, there is *nothing* in the architecture that is HW-specific. This is intentional --- SW rdmaoe will be straighforward, efficient (as is SW FCoE), and simple (this is rdma transport after all!) to implement over any nic. We already know of people that are looking into that.
I'm not convinced... e.g we'll get to this more when doing a detailed 
review on the multicast thing. Also, I haven't seen any comment from 
anyone on your API changes, can be nice if you act to make the sw rdmaoe 
developers comment here.

Or.




More information about the general mailing list