[ofa-general] Re: IB Bonding errors with recent kernel

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Tue Dec 9 17:27:23 PST 2008


Or Gerlitz wrote:
>> If I am not mistaken the issue you mention is a little different from the one I pointed out.
>> Without bonding I see the following:
>> kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
>> However, with bonding what I see is :
>> ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22
> 
> Please note that -11 EAGAIN (try again) is and -22 is EINVAL (invalid
> argument). So you can get EAGAIN when the underlying core sa agent is
> not ready to send SA queries, while you get EINVAL when attempting to
> join on a junk MGID. I am confident that for long time we see joins on
> junk MGIDs and it has been reported on this list (google...) in the
> past, no resolution yet.

Or,

I looked through the mailing list going back more than a year. The closest
I can find to this issue (-EINVAL) was when you reported problems with junk MGID on a
child interface (and that works properly now).

I agree that the -EAGAIN problem has been known for some time now. However, this issue with
IPoIB bonding is new. My recollections are that it all worked properly around end October.
I had not tested since then, so this is something that must have cropped in the interregnum.

> 
> Under bonding there might be a window is time where from the kernel
> network stack perspective the bonding device ether-type is ethernet
> and not infiniband and hence the wrong (ip_eth_mc_map instead of
> ip_ib_mc_map) function would be called to do the mapping from the IP
> multicast address to the HW multicast address
> 
> 
>> Subsequently an ib-bond status does not reveal any slave as active as shown below:
>> ib-bond --status
>> bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9
>> slave0: ib0
>> slave1: ib1
> 
> As this script is not standard and deprecated, I would recommend not
> to use it but rather the classic /proc/net/bonding/bond0 entry, along
> with ip addr show on bond0, ib0, ib1
Thanks for alerting me to the fact that the ib-bond script was deprecated. Again this seemed
to all work about 6 weeks ago. Is that (ib-bond is deprecated) documented somewhere?

Pradeep




More information about the general mailing list