[ofa-general] patch to ib_addr for sending arps
Jason Gunthorpe
jgunthorpe at obsidianresearch.com
Mon Jul 13 06:49:06 PDT 2009
On Sun, Jul 12, 2009 at 08:38:38PM -0700, leo.tominna at oracle.com wrote:
> Associating the device with the source IP seems to be the correct thing to
> do in general, but I initially avoided it in favor of source based routing
> rules/tables since Linux does not do this by default. Source based routing
> seems to be the only way to get load balancing right for regular IP
> traffic
Right, this is the Linux Way. The route table associated the output
device with the source/destination pair, and it is correct and
necessary to have source route policy entries to do what you are
talking about. As per the thread Or dug up older versions didn't do
this right - did it ever get fixed?
It is also necessary to use one of the arp_ignore settings otherwise
the receiver side responds with the wrong physical address. Source
routing fixes the transmitter, arp_ignore fixes the receiver.
> when two local IPs are on the same subnet, so I thought it would be better
> to have the src_ip alone cause the routing lookup to associate everything
> with the correct device, as that would works for regular IP traffic and
> anyone else, like ib_addr clients.
> The problems I was seeing with arp was that Linux associates arp entries
> with specific devices, so if source based routing is used, and the arp send
> does not take src_ip into account, the arp is sent from the default device,
> and thats the device that gets the arp entry, where as neigh_lookup was
> looking for an entry on the correct device, and was never finding the
> neighbor.
The proper flow is that the source/dest pair (+ plus extra) are used
to do a route lookup. The route lookup returns the output device to
use. The arp is sent out that device with source/dest pair. The
reciever will then receive the broadcast arp on all interfaces and you
need the arp_ignore setting to cause only the interface bound to that
IP to generate a response.
> From what I can tell, on HPUX there is no device association with arp
> entries. Does anyone know why Linux has this flexibility? It looks like
> this was added in 2.2 or something, and I can't see any useful applications
> for this.
I can have multiple networks with overlapping IP spaces and using
routing tricks like policy routing I can generate overlapping network
specific ARP entries. Consider that I might have two ethernet ports
per machine, and two networks, both with the same IP. I can use policy
routing to direct all high priority traffic to one network, and low
priority to the other. The IPs are the same, but the macs are
different, thus the arp table must be keyed by interface and
destination.
Once you have complex routing stuff like policy routing this becomes
necessary, presumably HPUX does not have this..
> The initial patch I sent is less restrictive but relies on source based
> routing to get everything working, perhaps an explicit device mapping (as
> above) makes more sense for RDMA traffic. Please correct me if something I
> said is incorrect or if these changes conflict with other working
> configurations.
This is Linux, the RDMA IP behavior should exactly match the in-kernel
IP behavior. It wasn't clear to me if your patch did that or not?
On the surface it looks right, sending the arp from the device (and
IP?) the requesting packet is going out is the right thing..
Jason
More information about the general
mailing list