[ofa-general] patch to ib_addr for sending arps

leo.tominna at oracle.com leo.tominna at oracle.com
Sun Jul 12 20:38:38 PDT 2009


Hi Or,

Sorry, I had not seen this earlier thread.  I think a combination of the 
two patches, and another change would address both our problems.  The 
patch I sent ensures neigh_lookup and neigh_event_send lookup the same 
route if source based routing tables are being used.

With Sean's patch, if that ip_dev_lookup is duplicated to the 
addr_send_arp, that would address the source based routing case as well 
I think:

addr_send_arp:
+  if (src_ip)
+    oif = ib_dev_lookup(src_ip)
+  s_addr = src_ip;

addr_resolve_remote:
+  if (src_ip)
+    oif = ib_dev_lookup(src_ip)
     s_addr = src_ip; /* this was already there */

Associating the device with the source IP seems to be the correct thing 
to do in general, but I initially avoided it in favor of source based 
routing rules/tables since Linux does not do this by default.  Source 
based routing seems to be the only way to get load balancing right for 
regular IP traffic when two local IPs are on the same subnet, so I 
thought it would be better to have the src_ip alone cause the routing 
lookup to associate everything with the correct device, as that would 
works for regular IP traffic and anyone else, like ib_addr clients.

The problems I was seeing with arp was that Linux associates arp entries 
with specific devices, so if source based routing is used, and the arp 
send does not take src_ip into account, the arp is sent from the default 
device, and thats the device that gets the arp entry, where as 
neigh_lookup was looking for an entry on the correct device, and was 
never finding the neighbor.

I think arp_ignore=1 is also needed.

 From what I can tell, on HPUX there is no device association with arp 
entries.  Does anyone know why Linux has this flexibility?  It looks 
like this was added in 2.2 or something, and I can't see any useful 
applications for this.

Second, HPUX sends a single reply for arps, with the correct hardware 
information, regardless of what device it is replied from (or maybe it 
always replies from the correct device, doesn't matter, it works).  On 
Linux each device replies with its own hardware address (unless 
arp_ignore is used).  I've also seen arp requests being sent with the 
wrong hardware address, mainly with ping initiated traffic.  when sent 
from devX, "who has ipZ tell ipY" causes the node with ipZ to create an 
implicit arp entry associating ipY with devX (rather than devY).

The initial patch I sent is less restrictive but relies on source based 
routing to get everything working, perhaps an explicit device mapping 
(as above) makes more sense for RDMA traffic.  Please correct me if 
something I said is incorrect or if these changes conflict with other 
working configurations.

Thanks for your help.
Leo Tominna

Disclaimer: The statements and opinions expressed here are my own and do 
not necessarily reflect those of my employer.

On 7/12/2009 4:13 AM, Or Gerlitz wrote:
> Leo Tominna wrote:
>> This patch appears to help when strict ARP handling is enabled or 
>> when non-standard routing tables are used.  The ARP request is 
>> replied to through the device that will be used for subsequent 
>> communication, so the ARP entry gets associated with the correct 
>> device in the ARP cache.  tcpdump shows consistent ARPs generated by 
>> similar arguments to ping and rds-ping.
> Hi Loe,
>
> Does this patch comes to solve the problem discussed over the "pick 
> the outgoing HCA based on the IP used for bind" threads dated to 
> February this year at the general and rds-devel mailing lists 
> (http://lists.openfabrics.org/pipermail/general/2009-February/057008.html)? 
> also by "strict ARP handling" do you refer to the case where there are 
> multiple NICs on the same L2 broadcast domain (VLAN/Partition)?
>
> Or.
>
>



More information about the general mailing list