***SPAM*** Re: [ofa-general] ***SPAM*** Re: IB Bonding errors with recent kernel

Dennis Portello dennis.portello at gmail.com
Mon Apr 20 06:55:58 PDT 2009


Hello Or,

Thanks for the reply.

I enabled debug and these are the results of my test. (below)

First off, I ran this same test on bonded ethernet and on a single IB
interface with success.

sudo route add -net 224.0.0.0/3 gw 192.168.47.102
socat STDIO UDP4-DATAGRAM:224.1.0.1:6666,bind=:6666,range=
192.168.47.0/24,ip-add-membership=224.1.0.1:192.168.47.102

sudo route add -net 224.0.0.0/3 gw 192.168.47.100
socat STDIO UDP4-DATAGRAM:224.1.0.1:6666,bind=:6666,range=
192.168.47.0/24,ip-add-membership=224.1.0.1:192.168.47.100

socat sets up a peer-peer multicast communication, the expected results are
echoed data on the sending end and data on the receiving end.

When attempting this test with bonded IB interfaces, I only get get the
echoed data on the sending end and nothing on the recieving end.

here are the results from dmesg
[  859.128720] bonding: bond3 is being created...
[  859.129468] bonding: bond3: setting mode to active-backup (1).
[  859.129501] bonding: bond3: Setting MII monitoring interval to 100.
[  859.141557] bonding: bond3: doing slave updates when interface is down.
[  859.141563] bonding: bond3: Adding slave ib0.
[  859.141566] bonding bond3: master_dev is not up in bond_enslave
[  859.141567] bonding: bond3: Warning: enslaved VLAN challenged slave ib0.
Adding VLANs will be blocked as long as ib0 is part of bond bond3
[  859.141570] bonding: bond3: Warning: The first slave device specified
does not support setting the MAC address. Setting fail_over_mac to
active.<7>ib0: bringing up interface
[  859.182437] ib0: starting multicast thread
[  859.182568] ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
[  859.182580] ib0: restarting multicast task
[  859.182583] ib0: stopping multicast thread
[  859.182586] ib0: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0000:0001
[  859.182589] ib0: starting multicast thread
[  859.182739] ib0: join completion for
ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
[  859.182951] ib0: Created ah ffff8804379e8680
[  859.182954] ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV
ffff8804379e8680, LID 0xc000, SL 0
[  859.183088] ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
[  859.183222] ib0: join completion for
ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
[  859.183354] ib0: Created ah ffff8804389a9880
[  859.183359] ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV
ffff8804389a9880, LID 0xc001, SL 0
[  859.184369] bonding: bond3: enslaving ib0 as a backup interface with a
down link.
[  859.186365] ib0: successfully joined all multicast groups
[  859.186385] ib0: restarting multicast task
[  859.186386] ib0: stopping multicast thread
[  859.186389] ib0: starting multicast thread
[  859.186500] ib0: successfully joined all multicast groups
[  859.188608] bonding: bond3: doing slave updates when interface is down.
[  859.188613] bonding: bond3: Adding slave ib1.
[  859.188615] bonding bond3: master_dev is not up in bond_enslave
[  859.188617] bonding: bond3: Warning: enslaved VLAN challenged slave ib1.
Adding VLANs will be blocked as long as ib1 is part of bond bond3
[  859.221889] ib1: bringing up interface
[  859.222359] ib1: starting multicast thread
[  859.222483] ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
[  859.222494] ib1: restarting multicast task
[  859.222498] ib1: stopping multicast thread
[  859.222500] ib1: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0000:0001
[  859.222503] ib1: starting multicast thread
[  859.224240] bonding: bond3: enslaving ib1 as a backup interface with a
down link.
[  859.224634] ib1: join completion for
ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
[  859.224837] ib1: Created ah ffff880436cc8400
[  859.224841] ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV
ffff880436cc8400, LID 0xc000, SL 0
[  859.224968] ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
[  859.225099] ib1: join completion for
ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
[  859.225223] ib1: Created ah ffff88043840fec0
[  859.225228] ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV
ffff88043840fec0, LID 0xc001, SL 0
[  859.226956] ib1: successfully joined all multicast groups
[  859.226961] ib1: restarting multicast task
[  859.226962] ib1: stopping multicast thread
[  859.226964] ib1: starting multicast thread
[  859.227074] ib1: successfully joined all multicast groups
[  859.228034] ib0: mtu > 2044 will cause multicast packet drops.
[  859.229779] ib1: mtu > 2044 will cause multicast packet drops.
[  859.233134] ADDRCONF(NETDEV_UP): bond3: link is not ready
[  859.233153] bonding: bond3: link status definitely up for interface ib0.
[  859.233156] bonding: bond3: making interface ib0 the new active one.
[  859.233167] ib0: restarting multicast task
[  859.233170] ib0: stopping multicast thread
[  859.233172] ib0: adding multicast entry for mgid
0001:0000:0000:0000:0000:0000:0000:0000
[  859.233175] ib0: starting multicast thread
[  859.233178] bonding: bond3: first active interface up!
[  859.233180] bonding: bond3: link status definitely up for interface ib1.
[  859.233289] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[  859.234904] ADDRCONF(NETDEV_CHANGE): bond3: link becomes ready
[  859.234944] ib0: restarting multicast task
[  859.234948] ib0: stopping multicast thread
[  859.234951] ib0: adding multicast entry for mgid
ff12:601b:ffff:0000:0000:0001:ff00:f778
[  859.234954] ib0: starting multicast thread
[  859.235069] ib0: joining MGID ff12:601b:ffff:0000:0000:0001:ff00:f778
[  859.235090] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[  859.235095] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[  859.235162] ib0: restarting multicast task
[  859.235163] ib0: stopping multicast thread
[  859.235166] ib0: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0000:00fb
[  859.235168] ib0: starting multicast thread
[  859.235200] ib0: join completion for
ff12:601b:ffff:0000:0000:0001:ff00:f778 (status 0)
[  859.235304] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[  859.235343] ib0: Created ah ffff88043a9b9440
[  859.235347] ib0: MGID ff12:601b:ffff:0000:0000:0001:ff00:f778 AV
ffff88043a9b9440, LID 0xc002, SL 0
[  859.235408] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[  859.235412] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[  859.235481] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[  859.235592] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[  859.235596] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[  859.260028] ib0: setting up send only multicast group for
ff12:601b:ffff:0000:0000:0000:0000:0016
[  859.260042] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting join
[  859.260136] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[  859.263792] ib0: setting up send only multicast group for
ff12:401b:ffff:0000:0000:0000:0000:0016
[  859.263806] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[  859.263883] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[  860.600025] ib0: setting up send only multicast group for
ff12:601b:ffff:0000:0000:0000:0000:0002
[  860.600035] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0002, starting join
[  860.600149] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[  863.230303] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[  863.230406] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[  863.230411] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[  864.600035] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0002, starting join
[  864.600124] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[  868.600034] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0002, starting join
[  868.600119] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[  868.620031] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[  868.620112] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[  869.100039] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting join
[  869.100124] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[  869.600029] bond3: no IPv6 routers present
[  879.230231] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[  879.230349] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[  879.230355] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[  886.919993] ib0: restarting multicast task
[  886.919997] ib0: stopping multicast thread
[  886.920002] ib0: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0001:0001
[  886.920005] ib0: starting multicast thread
[  886.920140] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[  886.920244] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[  886.920248] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[  886.934421] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[  886.934520] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[  889.000014] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[  889.000102] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[  899.053269] ib0: restarting multicast task
[  899.053273] ib0: stopping multicast thread
[  899.053277] ib0: deleting multicast group
ff12:401b:ffff:0000:0000:0000:0001:0001
[  899.053280] ib0: deleting multicast group
ff12:401b:ffff:0000:0000:0000:0001:0001
[  899.053285] ib0: starting multicast thread
[  899.053430] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[  899.053540] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[  899.053544] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[  899.073152] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[  899.073241] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[  903.420017] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[  903.420100] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22


Thank you,
Dennis P.

On Mon, Apr 20, 2009 at 8:17 AM, Or Gerlitz <ogerlitz at voltaire.com> wrote:

> Dennis Portello wrote:
> > Regular TCP/IP unicast works, though dmesg is full of warning about
> > multicast failing. Multicast does not work at all.
>
> Unicast IP relies on ARP and IPoIB ARPs use the broadcast multicast group,
> so
> IB multicast does work on your setup... to see what IB multicast groups are
> being
> joined by your IPoIB devices, you can use the ipoib debugfs entries
>
> $ mount -t debugfs none /sys/kernel/debug
> $ cat /sys/kernel/debug/ipoib/ibxxx_mcg
>
> see Documentation/infiniband/ipoib.txt for more info
>
> Or.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090420/54f7e5c9/attachment.html>


More information about the general mailing list