[openib-general] [RFC] IB address translation using ARP

Sat Oct 8 20:39:46 PDT 2005

On Fri, 2005-10-07 at 20:13 -0400, Hal Rosenstock wrote:
> On Fri, 2005-10-07 at 19:57, Sean Hefty wrote:
> > Hal Rosenstock wrote:
> > > Would an iWARP connection jump across IP subnets ? It would need to
> > > determine that it could do this (ala NHRP with ATM). Also, could there
> > > be other RDMA networks between them (like IB) ?
> > 
> > if iWarp is on top of TCP, I don't think that it would care about IP subnets.
> 
> I think iWARP can be on top of TCP or SCTP. But why wouldn't it care ? 
> Doesn't a routing decision still need to be made at the IP layer ?
> Doesn't the IP next hop need to be determined (e.g. gateway when the
> destination is off the local IP subnet) ? Is there something that
> precludes iWARP from working across IP subnets ?
> 
> -- Hal
> 
I've just read through entire this thread for the first time, and I
sense considerable confusion about how IP routing works. I know I'm
confused ;-)

With sockets, the path to the remote peer is determined *after* the
connection request is submitted by the app (connect(...)). The app has
no idea which local interface will ultimately handle this connection or
what the path (route) is to the remote peer. It simply says
connect(67.65.105.4, ...). In fact, TCP doesn't know this either! Like
Hal suggests, the connect request (SYN packet) gets all the way down to
IP where the least cost route is selected, and if not already known, the
Ethernet address is determined (arp) for the next hop. The reasons for
this are varied but include: routes may change, Ethernet addresses for
next hops change, all within the lifetime of a connection. Almost
certainly if the connection lasts more than 15 minutes.

The route identifies the local interface, and next hop IP. An interface
is only ever on a single subnet. The ARP broadcast is issued on this
interface and is only on this one subnet. We're not broadcasting across
subnets. Note that the local interface is "logical", and a single
Ethernet NIC may have multiple IP addresses and may in fact be on
multiple subnets if using VLAN. 

It is theoretically possible to support all this on an IPoIB based
network. Multiple subnets, multiple routes to remote peers, ICMP
redirect, multiple IP addresses for each physical interface, yada yada
yada. But IMHO, the only way to do this would be to tie directly into
the existing routing,  ARP, ICMP, etc... subsystems in Linux. Otherwise
you'll end up recreating a gigantic (and I mean GIGANTIC) amount of
code. This belief is why I've been a proponent of mapping GIDs to one
and only one IP address and treating it for management purposes as the
equivalent of an IP address. Without this, the whole mechanism for
determining routes, etc.. breaks down. If you treat the GID like a MAC
address -- it breaks, because a MAC address can have multiple IP
addresses -- the observation that lead to the conclusion that ATS was
broken in the first place.

I know there is significant resistance to this idea, but I just don't
see how we get this generically resolved without binding the two
addressing schemes more closely. With the current binding, I just don't
think it works.

If I'm off in the weeds, please let me know ... and I'll cease spouting
off.

> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general