[openib-general] [RFC] IB address translation using ARP
Michael Krause
krause at cup.hp.com
Wed Oct 12 12:39:56 PDT 2005
Isn't this getting a bit more complex than it needs to be. Let me see if I
have this correct:
1. Applications want to use existing API to identify remote endnodes /
services.
2. Endnodes are identified by an IPv4 / v6 address and services by a port
number
3. The existing network stacks already comprehend how to discover routes to
endnodes using ARP / ND. These protocols can determine whether there is a
single or multiple IP addresses and store these in the local network stack
route table.
4. Route tables can contain any number of layer 2 and 3 address information
(function of implementation) and various policies can be constructed to
make an intelligent decision on which layer 2 and 3 addresses to return to
an application.
5. iWARP can use the existing infrastructure without modification so no
changes are required to make it work.
6. IB uses a different layer 2 address - not just a 48-bit MAC - thus while
different than Ethernet, conceptually works just the same. Both can
support multiple IP addresses per layer 2 address as it is really just a
matter of replicating the information on a per IP address basis.
7. When a route look up occurs, a set of IP addresses are
returned. Depending upon the kernel interface, one can also return the
layer 2 information either as part of this look up or as a separate query
to the route table.
8. Layer 2 information provides the necessary data to construct CM messages
or to identify the path for the IP over IB ULP.
So, from the above, it seems the IP and IB world can operate using the same
code and work just fine. So, where is the problem? Is it really just how
management assigns IP address to IB interfaces and how an application
should select or be informed of which IP address to use and thus
transparently identifies the IB port? Where is the connection
establishment problem? The application does not see any difference. The
network stack only acts as a repository for routing information unless
running directly over IP over IB thus is not impacted. The middleware
simply needs to extract the layer 2 information thus obtains the requisite
data to construct the CM messages when going straight to IB (there is no
change required here for iWARP as this is all native to its
operation). What am I missing here?
Mike
At 10:10 AM 10/9/2005, Tom Tucker wrote:
>On Sun, 2005-10-09 at 07:57 -0700, Sean Hefty wrote:
> > >It is theoretically possible to support all this on an IPoIB based
> > >network. Multiple subnets, multiple routes to remote peers, ICMP
> > >redirect, multiple IP addresses for each physical interface, yada yada
> > >yada. But IMHO, the only way to do this would be to tie directly into
> > >the existing routing, ARP, ICMP, etc... subsystems in Linux. Otherwise
> > >you'll end up recreating a gigantic (and I mean GIGANTIC) amount of
> >
> > The current implementation ties into the standard Linux ARP tables. If
> > connections were made over TCP/IP, using IPoIB, then I don't think that
> there
> > would be any issues. The issues only arise because of the desire to
> use TCP/IP
> > network addresses over a non-TCP/IP network.
> >
> > >code. This belief is why I've been a proponent of mapping GIDs to one
> > >and only one IP address and treating it for management purposes as the
> > >equivalent of an IP address. Without this, the whole mechanism for
> > >determining routes, etc.. breaks down. If you treat the GID like a MAC
> > >address -- it breaks, because a MAC address can have multiple IP
> > >addresses -- the observation that lead to the conclusion that ATS was
> > >broken in the first place.
> >
> > We should be able to handle the case where a GID has multiple IP
> addresses bound
> > to it. But even if we added a 1:1 restriction, the connection over IB
> issue
> > still exists.
>
>I agree, except for RARP.
>
> >
> > >I know there is significant resistance to this idea, but I just don't
> > >see how we get this generically resolved without binding the two
> > >addressing schemes more closely. With the current binding, I just don't
> > >think it works.
> >
> > Again, I don't think that the binding is the issue, so much as the
> desire to use
> > an address for a protocol that isn't actually being used for
> communication.
>
>Not to be pedantic, but if binding or mapping or somesuch weren't an
>issue we wouldn't need AT.
>
> > I
> > don't view a GID as an IP address because we're not sending and
> receiving IP
> > packets on the GID. IPoIB treats GIDs as only part of a MAC address,
> which I
> > think is the proper view.
> >
> > Anyway, returning back to the original problem of connecting to an IB
> gateway if
> > a given a destination IP address on a different subnet... I'm slowly
> convincing
> > myself that either the CMA or AT should do this. (I believe that the
> ib_addr
> > code will do this now, but still wasn't sure that we wanted it to.)
> >
>
>IMHO, you need a service separate from the CMA to do address
>translation. My (iWARP's) rationale for this is that there are two
>clients of the service, the CM and IP. For CM, you need it to elect a
>route and thereby a local interface. For IP you need it because routes
>change and ARP entries time out.
>
>BTW, can you educate me ... is the following what you're thinking:
>
>On the client side...
>
>- route is discovered by looking at the Linux routing table
>- local interface is IPoIB (looks at rdma_ptr embedded in netdev struct)
>- send ARP AT message over local IB interface
>
>At the gateway...bridging to IP
>
>- ARP AT query received on IB interface
>- Lookup route to destination IP address in gateway's route table.
>- If next hop's Ethernet address is already known, it is returned
>- Otherwise, local interface identified is IPoEthernet
>- New ARP query goes out on the local interface from the route
>- When response comes back, answer is returned.
>
>At the gateway...bridging to IPoIB
>
>- ARP AT message received on IB interface, delivered to AT
>- Lookup route to destination IP address in gateway's route table
>- If next hop's Ethernet address is already known, it is returned
>- otherwise, local interface identified in route is IPoIB
>- New ARP AT query goes out on the local interface
>- When response comes back, answer is returned.
>
>Thanks,
>
>
>
> > - Sean
> >
> >
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit
>http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051012/e21487e2/attachment.html>
More information about the general
mailing list