Hello,<br><br>I seem to be experiencing the exact issue discussed below (back in December). I'm using the 2.6.27 kernel and the bonding drivers available in that kernel. Was there ever a solution or patch to solve this? I have been using the ib-bond scripts as well, but using other approaches like standard OS tools or adding the bond through sysfs all seem to have the same results.<br>
<br>Regular TCP/IP unicast works, though dmesg is full of warning about multicast failing. Multicast does not work at all.<br><br>Any hints or suggestions would be greatly appreciated.<br><br>Best regards,<br>Dennis Portello<br>
<br>> Or Gerlitz wrote:<br>>>> If I am not mistaken the issue you mention is a little different from the one I pointed out.<br>>>> Without bonding I see the following:<br>>>> kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11<br>
>>> However, with bonding what I see is :<br>>>> ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22<br>>> <br>>> Please note that -11 EAGAIN (try again) is and -22 is EINVAL (invalid<br>
>> argument). So you can get EAGAIN when the underlying core sa agent is<br>>> not ready to send SA queries, while you get EINVAL when attempting to<br>>> join on a junk MGID. I am confident that for long time we see joins on<br>
>> junk MGIDs and it has been reported on this list (google...) in the<br>>> past, no resolution yet.<br>><br>> Or,<br>><br>> I looked through the mailing list going back more than a year. The closest<br>
> I can find to this issue (-EINVAL) was when you reported problems with junk MGID on a<br>> child interface (and that works properly now).<br>><br>> I agree that the -EAGAIN problem has been known for some time now. However, this issue with<br>
> IPoIB bonding is new. My recollections are that it all worked properly around end October.<br>> I had not tested since then, so this is something that must have cropped in the interregnum.<br>><br>>> <br>
>> Under bonding there might be a window is time where from the kernel<br>>> network stack perspective the bonding device ether-type is ethernet<br>>> and not infiniband and hence the wrong (ip_eth_mc_map instead of<br>
>> ip_ib_mc_map) function would be called to do the mapping from the IP<br>>> multicast address to the HW multicast address<br>>> <br>>> <br>>>> Subsequently an ib-bond status does not reveal any slave as active as shown below:<br>
>>> ib-bond --status<br>>>> bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9<br>>>> slave0: ib0<br>>>> slave1: ib1<br>>> <br>>> As this script is not standard and deprecated, I would recommend not<br>
>> to use it but rather the classic /proc/net/bonding/bond0 entry, along<br>>> with ip addr show on bond0, ib0, ib1<br>> Thanks for alerting me to the fact that the ib-bond script was deprecated. Again this seemed<br>
> to all work about 6 weeks ago. Is that (ib-bond is deprecated) documented somewhere?<br>><br>> Pradeep<br>><br>