[ofa-general] uDAPL libdat2.so version [PATCH] udapl v1 and v2 - dat_create_psp_any() seed value wrong

Arlin Davis ardavis at ichips.intel.com
Sun Feb 10 16:46:26 PST 2008


Tang, Changqing wrote:
> I am testing OFED 1.3 udapl v1, I have three nodes, n1, n2, and n3,
> if I run two ranks between n1 and n2, it works, n2 and n3, it works again,
> but if I run between n1 and n3, it fails with:
> 
> dat_cr_accept() failed: DAT_INTERNAL_ERROR
> 
> What could be the reason ? I did not change anything else except the
> node to run. Thanks for help.
> 
> 
What IPoIB interfaces are configured on the nodes? Can you ping via
IPoIB from n1 to n3? Are you using the same IB port on each node?

This error could be caused by a physical port mismatch between
the connect request and the listen bindings due to the ARP reply.

If you have multiple interfaces then one may reply to an ARP
directed to the other interfaces on the system. The following
configuration will cause the interfaces to ignore ARP requests
not directed to their specific IP address.

Add the following lines to /etc/sysctl.conf

net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.ib0.arp_ignore=1
net.ipv4.conf.ib1.arp_ignore=1

or use sysctl:

sysctl -w net.ipv4.conf.all.arp_ignore=1
sysctl -w net.ipv4.conf.ib0.arp_ignore=1
sysctl -w net.ipv4.conf.ib1.arp_ignore=1

-arlin



More information about the general mailing list