[ofa-general] patch to ib_addr for sending arps
leo.tominna at oracle.com
leo.tominna at oracle.com
Mon Jul 13 10:14:05 PDT 2009
Hi Jason,
Thanks for clearing up the use case. In that case doing ip_dev_find to
set oif would be wrong since it would not work correctly in the case the
same IP is associated with two devices. By just setting s_addr before
calling ip_route_output_key in addr_send_arp, that should take care of
it (the initial patch sent).
From what I can tell, this just fixes the policy routing case, without
affecting/addressing configurations that are using default routing. I
need to see why RDS/IB gets stuck in this case. My guess is that
hardware addresses don't get resolved correctly (as expected), and two
sides of an IB connection trip over a mismatch in what hardware a peer
thinks its using.
But that is another issue that can be fixed independently. I'll add
some prints to see what might be happening.
Thanks,
Leo Tominna
On 7/13/2009 6:49 AM, Jason Gunthorpe wrote:
> On Sun, Jul 12, 2009 at 08:38:38PM -0700, leo.tominna at oracle.com wrote:
>
>
>> Associating the device with the source IP seems to be the correct thing to
>> do in general, but I initially avoided it in favor of source based routing
>> rules/tables since Linux does not do this by default. Source based routing
>> seems to be the only way to get load balancing right for regular IP
>> traffic
>>
>
> Right, this is the Linux Way. The route table associated the output
> device with the source/destination pair, and it is correct and
> necessary to have source route policy entries to do what you are
> talking about. As per the thread Or dug up older versions didn't do
> this right - did it ever get fixed?
>
> It is also necessary to use one of the arp_ignore settings otherwise
> the receiver side responds with the wrong physical address. Source
> routing fixes the transmitter, arp_ignore fixes the receiver.
>
>
>> when two local IPs are on the same subnet, so I thought it would be better
>> to have the src_ip alone cause the routing lookup to associate everything
>> with the correct device, as that would works for regular IP traffic and
>> anyone else, like ib_addr clients.
>>
>
>
>> The problems I was seeing with arp was that Linux associates arp entries
>> with specific devices, so if source based routing is used, and the arp send
>> does not take src_ip into account, the arp is sent from the default device,
>> and thats the device that gets the arp entry, where as neigh_lookup was
>> looking for an entry on the correct device, and was never finding the
>> neighbor.
>>
>
> The proper flow is that the source/dest pair (+ plus extra) are used
> to do a route lookup. The route lookup returns the output device to
> use. The arp is sent out that device with source/dest pair. The
> reciever will then receive the broadcast arp on all interfaces and you
> need the arp_ignore setting to cause only the interface bound to that
> IP to generate a response.
>
>
>> From what I can tell, on HPUX there is no device association with arp
>> entries. Does anyone know why Linux has this flexibility? It looks like
>> this was added in 2.2 or something, and I can't see any useful applications
>> for this.
>>
>
> I can have multiple networks with overlapping IP spaces and using
> routing tricks like policy routing I can generate overlapping network
> specific ARP entries. Consider that I might have two ethernet ports
> per machine, and two networks, both with the same IP. I can use policy
> routing to direct all high priority traffic to one network, and low
> priority to the other. The IPs are the same, but the macs are
> different, thus the arp table must be keyed by interface and
> destination.
>
> Once you have complex routing stuff like policy routing this becomes
> necessary, presumably HPUX does not have this..
>
>
>> The initial patch I sent is less restrictive but relies on source based
>> routing to get everything working, perhaps an explicit device mapping (as
>> above) makes more sense for RDMA traffic. Please correct me if something I
>> said is incorrect or if these changes conflict with other working
>> configurations.
>>
>
> This is Linux, the RDMA IP behavior should exactly match the in-kernel
> IP behavior. It wasn't clear to me if your patch did that or not?
>
> On the surface it looks right, sending the arp from the device (and
> IP?) the requesting packet is going out is the right thing..
>
> Jason
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090713/44dc8f88/attachment.html>
More information about the general
mailing list