[ofa-general] patch to ib_addr for sending arps
leo.tominna at oracle.com
leo.tominna at oracle.com
Sun Jul 12 20:38:38 PDT 2009
Hi Or,
Sorry, I had not seen this earlier thread. I think a combination of the
two patches, and another change would address both our problems. The
patch I sent ensures neigh_lookup and neigh_event_send lookup the same
route if source based routing tables are being used.
With Sean's patch, if that ip_dev_lookup is duplicated to the
addr_send_arp, that would address the source based routing case as well
I think:
addr_send_arp:
+ if (src_ip)
+ oif = ib_dev_lookup(src_ip)
+ s_addr = src_ip;
addr_resolve_remote:
+ if (src_ip)
+ oif = ib_dev_lookup(src_ip)
s_addr = src_ip; /* this was already there */
Associating the device with the source IP seems to be the correct thing
to do in general, but I initially avoided it in favor of source based
routing rules/tables since Linux does not do this by default. Source
based routing seems to be the only way to get load balancing right for
regular IP traffic when two local IPs are on the same subnet, so I
thought it would be better to have the src_ip alone cause the routing
lookup to associate everything with the correct device, as that would
works for regular IP traffic and anyone else, like ib_addr clients.
The problems I was seeing with arp was that Linux associates arp entries
with specific devices, so if source based routing is used, and the arp
send does not take src_ip into account, the arp is sent from the default
device, and thats the device that gets the arp entry, where as
neigh_lookup was looking for an entry on the correct device, and was
never finding the neighbor.
I think arp_ignore=1 is also needed.
From what I can tell, on HPUX there is no device association with arp
entries. Does anyone know why Linux has this flexibility? It looks
like this was added in 2.2 or something, and I can't see any useful
applications for this.
Second, HPUX sends a single reply for arps, with the correct hardware
information, regardless of what device it is replied from (or maybe it
always replies from the correct device, doesn't matter, it works). On
Linux each device replies with its own hardware address (unless
arp_ignore is used). I've also seen arp requests being sent with the
wrong hardware address, mainly with ping initiated traffic. when sent
from devX, "who has ipZ tell ipY" causes the node with ipZ to create an
implicit arp entry associating ipY with devX (rather than devY).
The initial patch I sent is less restrictive but relies on source based
routing to get everything working, perhaps an explicit device mapping
(as above) makes more sense for RDMA traffic. Please correct me if
something I said is incorrect or if these changes conflict with other
working configurations.
Thanks for your help.
Leo Tominna
Disclaimer: The statements and opinions expressed here are my own and do
not necessarily reflect those of my employer.
On 7/12/2009 4:13 AM, Or Gerlitz wrote:
> Leo Tominna wrote:
>> This patch appears to help when strict ARP handling is enabled or
>> when non-standard routing tables are used. The ARP request is
>> replied to through the device that will be used for subsequent
>> communication, so the ARP entry gets associated with the correct
>> device in the ARP cache. tcpdump shows consistent ARPs generated by
>> similar arguments to ping and rds-ping.
> Hi Loe,
>
> Does this patch comes to solve the problem discussed over the "pick
> the outgoing HCA based on the IP used for bind" threads dated to
> February this year at the general and rds-devel mailing lists
> (http://lists.openfabrics.org/pipermail/general/2009-February/057008.html)?
> also by "strict ARP handling" do you refer to the case where there are
> multiple NICs on the same L2 broadcast domain (VLAN/Partition)?
>
> Or.
>
>
More information about the general
mailing list