[openib-general] IB Address Translation service

Tue Mar 1 06:33:46 PST 2005

> 
>     Roland> First, let's understand the problem we're trying to solve.
>     Roland> Who are the consumers of this address translation service?
> 
>     shaharf> Any ULPs at user & kernel, and also some
>     shaharf> applications.
> 
> I think this is too general an answer.  We should be designing based
> on specific ULPs and applications.  For example, I don't see anything
> particularly useful to IPoIB in this API.  Perhaps Libor can comment
> on how this API works for SDP.
> 

You are right about the IPoIB. I think that IPoIB should not use this
API (or at least functions that may use ARP) because this creates a
circular dependency in the architecture. Of course this can be solved,
but I think that this is really unnecessarily. IPoIB have also relative
modest resolution requirement and I don't see why we should complex
things.

SDP, kDAPL and maybe others are a different story. As Libor already
mentioned in a different mail, the SDP already does a very similar
lookup. In fact one of my internal goals was to be able to fulfill SDP
requirement.
The internal resolution should be very similar to the current SDP
implementation, except that the ATS option is to be supported.

The ATS issue is orthogonal issue to this API. As long that there are
ULPs (such as kDAPL) that requires it (even just for reverse mapping),
we should provide it.

My personal opinion is that the IB-ARP + ATS combination is twisted. As
Libor wrote it brings up many issues regarding distributed mechanisms
vs. central mechanisms and databases. I guess that up to here there this
is a consensus. But my (personal not Voltaire's) take is that the
redundant mechanism is the ARP and not the ATS. My reasoning is simple,
IB mgt is centralistic. I don't like it but that's the way it is. Adding
contradicting mechanisms does not solve the problem, it just makes
everything more complex.
As I understand it the ARP reasoning is that due the fact that the
resolving process has two stages (IP->GID, GID->lid) it is reasonable to
use a separate and well known mechanism for IP to IB resolution. Another
issue is that it is distributed and therefore doesn't require SM (at
least when ignoring the multicast setup). I think that as IP is tunneled
over IB, it is not reasonable to use ARP, and its distributed nature is
a problem not a feature - the SM is still required for the path record
and the multicast management. The correct solution for the centralistic
IB management is to distribute the SM - not the underlying mechanisms. I
think that it is not too hard to distribute the SM or at least the SA
part of it. The SM/SA can also cache the requests much better that the
clients. Further more, a unified ATS + path query can be defined to
resolve everything in one stage. This will simplify many aspects of the
resolution.

But again, this is not the really the main issue.

> What application would use functions like ib_at_ips_by_gid90 or
> ib_at_ips_by_subnet()?
> 
>     shaharf> My take right now is to implement a kernel based
>     shaharf> mechanism and a user mode library to interface it.  There
>     shaharf> are other feasible solutions. I would really like you
>     shaharf> have your suggestions and preferences.
> 
> Unless there is a real kernel consumer that needs something this
> elaborate, I would prefer to implement this sort of caching service as
> a userspace daemon/library.  This allows for more sophisticated
> implementations (eg persistent caches) and also makes debugging and
> maintenance easier.
> 

ib_at_ips_by_gid() function is intended for reverse resolution, i.e. if
you have a gid and you want you resolve it back to ip/device, and
ib_at_ips_by_subnet()to let your resolve all IB devices (and GIDS) on a
subnet, for example for a application level load balancing/fail over.
ib_at_ips_by_gid() is required by kDAPL. I totally agree that
overengineering is bad. This means that some of the functions (such as
ib_at_ips_by_subnet) may be implemented at the first stage only in
usermode.

>     shaharf> I think that starting with the APIs is a valid approach
>     shaharf> that has its own advantages and disadvantages.
> 
> Sure, it's always good to have code in hand to start a discussion.
> But in this case the API seems to be far ahead of its consumers, so it
> ends up feeling overengineered to me.
> 

You are completely right. The proposed API is designed to cover the
(near) future requirement of ICER, NFS-RDMA, kDAPL, SDP, and other. It
attempts to cover the following issues:
	Resolution
	Back resolution
	Multi-pathing
	Fail-over
	TOS/QOS
	Partitioning

There are not visionary requirements. There are present or very near
future requirements. The API attempts to show the "correct" solutions
for some common problems. Without it, we may end with several different
and un-matching solutions to the same problems. We don't want the ULPs
to re-discover the wheel every time.

The only "over-engineering" IMO is the caching support. I think that
caching is a very likely to happen so it is best for the API to let the
clients know that "beware, these function may return cached results".
Some application may care. Note that the caching impact is only few
flags and invalidate function. This is not very big overhead.

For the usermode/kernel mode issue, I would be happy to implement
everything in usermode. This leaves just a small issue of efficient
kernel to user requests interfaces... Personally, I think that it is
legitimate architecture (User mode daemon to serve the kernel)
especially when you keep the caches within the kernel so the fast paths
do not require usermode intervention, and let the usermode daemons
maintain the caches and do the slow path tasks, where the extra context
switches overhead will be insignificant relative to the entire slow path
latency. I am not sure that my approach is very popular...

>  - R.

Shahar