[ofa-general] patch to ib_addr for sending arps

leo.tominna at oracle.com leo.tominna at oracle.com
Mon Jul 13 10:14:05 PDT 2009


Hi Jason,

Thanks for clearing up the use case.  In that case doing ip_dev_find to 
set oif would be wrong since it would not work correctly in the case the 
same IP is associated with two devices. By just setting s_addr before 
calling ip_route_output_key in addr_send_arp, that should take care of 
it (the initial patch sent).

 From what I can tell, this just fixes the policy routing case, without 
affecting/addressing configurations that are using default routing.  I 
need to see why RDS/IB gets stuck in this case.  My guess is that 
hardware addresses don't get resolved correctly (as expected), and two 
sides of an IB connection trip over a mismatch in what hardware a peer 
thinks its using.

But that is another issue that can be fixed independently.  I'll add 
some prints to see what might be happening.

Thanks,
Leo Tominna


On 7/13/2009 6:49 AM, Jason Gunthorpe wrote:
> On Sun, Jul 12, 2009 at 08:38:38PM -0700, leo.tominna at oracle.com wrote:
>
>   
>> Associating the device with the source IP seems to be the correct thing to 
>> do in general, but I initially avoided it in favor of source based routing 
>> rules/tables since Linux does not do this by default.  Source based routing 
>> seems to be the only way to get load balancing right for regular IP
>> traffic 
>>     
>
> Right, this is the Linux Way. The route table associated the output
> device with the source/destination pair, and it is correct and
> necessary to have source route policy entries to do what you are
> talking about. As per the thread Or dug up older versions didn't do
> this right - did it ever get fixed?
>
> It is also necessary to use one of the arp_ignore settings otherwise
> the receiver side responds with the wrong physical address. Source
> routing fixes the transmitter, arp_ignore fixes the receiver.
>
>   
>> when two local IPs are on the same subnet, so I thought it would be better 
>> to have the src_ip alone cause the routing lookup to associate everything 
>> with the correct device, as that would works for regular IP traffic and 
>> anyone else, like ib_addr clients.
>>     
>
>   
>> The problems I was seeing with arp was that Linux associates arp entries 
>> with specific devices, so if source based routing is used, and the arp send 
>> does not take src_ip into account, the arp is sent from the default device, 
>> and thats the device that gets the arp entry, where as neigh_lookup was 
>> looking for an entry on the correct device, and was never finding the 
>> neighbor.
>>     
>
> The proper flow is that the source/dest pair (+ plus extra) are used
> to do a route lookup. The route lookup returns the output device to
> use. The arp is sent out that device with source/dest pair. The
> reciever will then receive the broadcast arp on all interfaces and you
> need the arp_ignore setting to cause only the interface bound to that
> IP to generate a response.
>
>   
>> From what I can tell, on HPUX there is no device association with arp 
>> entries.  Does anyone know why Linux has this flexibility?  It looks like 
>> this was added in 2.2 or something, and I can't see any useful applications 
>> for this.
>>     
>
> I can have multiple networks with overlapping IP spaces and using
> routing tricks like policy routing I can generate overlapping network
> specific ARP entries. Consider that I might have two ethernet ports
> per machine, and two networks, both with the same IP. I can use policy
> routing to direct all high priority traffic to one network, and low
> priority to the other. The IPs are the same, but the macs are
> different, thus the arp table must be keyed by interface and
> destination.
>
> Once you have complex routing stuff like policy routing this becomes
> necessary, presumably HPUX does not have this..
>
>   
>> The initial patch I sent is less restrictive but relies on source based 
>> routing to get everything working, perhaps an explicit device mapping (as 
>> above) makes more sense for RDMA traffic.  Please correct me if something I 
>> said is incorrect or if these changes conflict with other working 
>> configurations.
>>     
>
> This is Linux, the RDMA IP behavior should exactly match the in-kernel
> IP behavior. It wasn't clear to me if your patch did that or not?
>
> On the surface it looks right, sending the arp from the device (and
> IP?) the requesting packet is going out is the right thing..
>
> Jason
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090713/44dc8f88/attachment.html>


More information about the general mailing list