[ofa-general] Re: IB Bonding errors with recent kernel

Or Gerlitz or.gerlitz at gmail.com
Tue Dec 9 12:21:52 PST 2008


> If I am not mistaken the issue you mention is a little different from the one I pointed out.
> Without bonding I see the following:
> kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
> However, with bonding what I see is :
> ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22

Please note that -11 EAGAIN (try again) is and -22 is EINVAL (invalid
argument). So you can get EAGAIN when the underlying core sa agent is
not ready to send SA queries, while you get EINVAL when attempting to
join on a junk MGID. I am confident that for long time we see joins on
junk MGIDs and it has been reported on this list (google...) in the
past, no resolution yet.

Under bonding there might be a window is time where from the kernel
network stack perspective the bonding device ether-type is ethernet
and not infiniband and hence the wrong (ip_eth_mc_map instead of
ip_ib_mc_map) function would be called to do the mapping from the IP
multicast address to the HW multicast address


> Subsequently an ib-bond status does not reveal any slave as active as shown below:
> ib-bond --status
> bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9
> slave0: ib0
> slave1: ib1

As this script is not standard and deprecated, I would recommend not
to use it but rather the classic /proc/net/bonding/bond0 entry, along
with ip addr show on bond0, ib0, ib1

Or.



More information about the general mailing list