[openib-general] IB Address Translation service

Tue Mar 1 09:11:52 PST 2005

At 11:17 PM 2/28/2005, Eric W. Biederman wrote:
>"Yaron Haviv" <yaronh at voltaire.com> writes:
>
> > > -----Original Message-----
> > > From: openib-general-bounces at openib.org [mailto:openib-general-
> > > bounces at openib.org] On Behalf Of Roland Dreier
> > > Sent: Monday, February 28, 2005 7:13 PM
> > > To: shaharf
> > > Cc: openib-general at openib.org
> > > Subject: Re: [openib-general] IB Address Translation service
> > >
> > > This API seems overly complex and at the same time too inflexible to
> > > me.  However, rather than getting bogged down nitpicking about APIs, I
> > > think we have to take a few steps back.
> >
> > I believe the API is very flexible, but we are pretty open to here what
> > you think is needed in addition
> >
> > > First, let's understand the problem we're trying to solve.  Who are
> > > the consumers of this address translation service?
> >
> > The first problem is that most ULPs use valid IP addresses for
> > simplicity (DAPL, iSER, NFS/RDMA, SDP, MPI, etc') and someone needs to
> > resolve it to an IB address and device to use IB. This should take into
> > account cases where there are more than one HCAs in the system.
> > Preferable/optionally the ULP would like to know which partition to use
> > if there is more than one, and leverage on the IP subnetting done by
> > IPoIB.
>
>I am confused.  In any sane network the translation is:
>Hostname -> address.
>
>IP because it spans multiple networks does:
>Hostname -> IP address -> hw address.
>
>IB because it can span multiple IB networks does:
>GUID+QPN -> LID + QPN.
>
>So what is wrong with simply doing:
>Hostname -> GUID
>???
>
>Then all the kernel needs to be passed GUID + QPN.
>
>I am certain MPI does not care about IP addresses.   It is the job
>of the mpi launcher to resolve where all of the pieces are.  Generally
>mpirun is done over IP and it just needs to collect the native network
>addresses before it leaves.

That still does not eliminate the need to resolve some form of address.

>It would be brain damaged for DAPL to require IP addresses.  Not that
>DAPL hasn't shown some brain damage already.

I don't believe the IT API requires ATS.  It is a bit more flexible and 
matches better with applications I think.

>Please, please remember that IP addresses
>
> > It is possible to replicate the same code you have in SDP (which is also
> > not complete) across all ULP's, I assume a better way is to provide it
> > in one central place.
>
>How about not even worrying about it.  It is an extra step that
>introduces latency and confusion.
>
>You can't do GUID -> IP because there is not a requirement on
>a 1 to 1 mapping.  And in general there is no fixed IP -> GUID mapping.
>
>What are the semantics in the upper levels when the IP -> GUID mapping
>changes?  Does you connection properly follow the IP to the new GUID?

It should follow a new mapping if done right.

>I don't see this making sense anywhere except user space.
>
> > There are also two proposed address resolution mechanisms, one is ARP
> > used by SDP, and one is ATS used by some DAPL consumers, and we believe
> > it is better to combine them under the same API.
>
>Just FYI IPv6 doesn't use arp.

ND or ARP for this point is less an issue.

> > The second problem relates to mapping of IB GID to one or more Path
> > records
> > This is also something needed for ALL ULP's. today each ULP provides the
> > minimal subset of path resolution functionality without taking into
> > account topics such as partitioning, QoS, source routing and
> > multi-pathing.
> > Some of these require using special SA queries (such as SA Multipath
> > Record query and QoSPath Query).
> > I don't think it make sense to put all this functionality into each ULP
> > as well.
>
>That part is reasonable.  Although the fact it is easy to knock
>OpenSM down concerns me.  However that looks to be a separate
>problem.
>
> > Than we can also discuss, does it make sense to have each path
> > resolution call lead us to the sa, or does it make more sense to cache
> > those paths.
> > And if we cache, doesn't it make more sense to cache/invalidate the
> > routes to all ULP's rather implementing/having it in each ULP.
> > Also not sure how a 1000 node cluster functions without the caching.
> >
> > And the last problem is related to reverse resolution from IB to IP
> > addresses that is needed for DAPL, as well as for different management
> > and diagnostic tools that want to know what is really that node/port
> > behind that GID addresses.
> >
> > So how would you suggest to go about it ?
> > Duplicate all of that in each ULP ?
> > Refrain from implementing advanced routing, partitioning, QoS (we cant
> > really maintain all that advanced code for each ULP) ?
>
>One small step at a time.  Where each step is obviously correct.
>
>One giant leap only works well for internal use.  Not for things
>that are heavily used.
>
> > Our idea is to provide those few helper functions that enable people to
> > make full use of IB and its features without reading all the IB spec,
> > and a Phd.
> > If you clear all the remarks from the library, you will see it is very
> > slim, and for my understanding includes all the relevant input and
> > output parameters for each of the 3 functions I mentioned.
>
>But an interface like that is usually provided by glibc not by the kernel.
>At the mixing of levels in that proposed API is absolutely horrible.
>
>
>Eric
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050301/58dc9186/attachment.html>