***SPAM*** Re: [ofa-general] ***SPAM*** Re: IB Bonding errors with recent kernel
Dennis Portello
dennis.portello at gmail.com
Mon Apr 20 06:55:58 PDT 2009
Hello Or,
Thanks for the reply.
I enabled debug and these are the results of my test. (below)
First off, I ran this same test on bonded ethernet and on a single IB
interface with success.
sudo route add -net 224.0.0.0/3 gw 192.168.47.102
socat STDIO UDP4-DATAGRAM:224.1.0.1:6666,bind=:6666,range=
192.168.47.0/24,ip-add-membership=224.1.0.1:192.168.47.102
sudo route add -net 224.0.0.0/3 gw 192.168.47.100
socat STDIO UDP4-DATAGRAM:224.1.0.1:6666,bind=:6666,range=
192.168.47.0/24,ip-add-membership=224.1.0.1:192.168.47.100
socat sets up a peer-peer multicast communication, the expected results are
echoed data on the sending end and data on the receiving end.
When attempting this test with bonded IB interfaces, I only get get the
echoed data on the sending end and nothing on the recieving end.
here are the results from dmesg
[ 859.128720] bonding: bond3 is being created...
[ 859.129468] bonding: bond3: setting mode to active-backup (1).
[ 859.129501] bonding: bond3: Setting MII monitoring interval to 100.
[ 859.141557] bonding: bond3: doing slave updates when interface is down.
[ 859.141563] bonding: bond3: Adding slave ib0.
[ 859.141566] bonding bond3: master_dev is not up in bond_enslave
[ 859.141567] bonding: bond3: Warning: enslaved VLAN challenged slave ib0.
Adding VLANs will be blocked as long as ib0 is part of bond bond3
[ 859.141570] bonding: bond3: Warning: The first slave device specified
does not support setting the MAC address. Setting fail_over_mac to
active.<7>ib0: bringing up interface
[ 859.182437] ib0: starting multicast thread
[ 859.182568] ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
[ 859.182580] ib0: restarting multicast task
[ 859.182583] ib0: stopping multicast thread
[ 859.182586] ib0: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0000:0001
[ 859.182589] ib0: starting multicast thread
[ 859.182739] ib0: join completion for
ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
[ 859.182951] ib0: Created ah ffff8804379e8680
[ 859.182954] ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV
ffff8804379e8680, LID 0xc000, SL 0
[ 859.183088] ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
[ 859.183222] ib0: join completion for
ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
[ 859.183354] ib0: Created ah ffff8804389a9880
[ 859.183359] ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV
ffff8804389a9880, LID 0xc001, SL 0
[ 859.184369] bonding: bond3: enslaving ib0 as a backup interface with a
down link.
[ 859.186365] ib0: successfully joined all multicast groups
[ 859.186385] ib0: restarting multicast task
[ 859.186386] ib0: stopping multicast thread
[ 859.186389] ib0: starting multicast thread
[ 859.186500] ib0: successfully joined all multicast groups
[ 859.188608] bonding: bond3: doing slave updates when interface is down.
[ 859.188613] bonding: bond3: Adding slave ib1.
[ 859.188615] bonding bond3: master_dev is not up in bond_enslave
[ 859.188617] bonding: bond3: Warning: enslaved VLAN challenged slave ib1.
Adding VLANs will be blocked as long as ib1 is part of bond bond3
[ 859.221889] ib1: bringing up interface
[ 859.222359] ib1: starting multicast thread
[ 859.222483] ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
[ 859.222494] ib1: restarting multicast task
[ 859.222498] ib1: stopping multicast thread
[ 859.222500] ib1: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0000:0001
[ 859.222503] ib1: starting multicast thread
[ 859.224240] bonding: bond3: enslaving ib1 as a backup interface with a
down link.
[ 859.224634] ib1: join completion for
ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
[ 859.224837] ib1: Created ah ffff880436cc8400
[ 859.224841] ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV
ffff880436cc8400, LID 0xc000, SL 0
[ 859.224968] ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
[ 859.225099] ib1: join completion for
ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
[ 859.225223] ib1: Created ah ffff88043840fec0
[ 859.225228] ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV
ffff88043840fec0, LID 0xc001, SL 0
[ 859.226956] ib1: successfully joined all multicast groups
[ 859.226961] ib1: restarting multicast task
[ 859.226962] ib1: stopping multicast thread
[ 859.226964] ib1: starting multicast thread
[ 859.227074] ib1: successfully joined all multicast groups
[ 859.228034] ib0: mtu > 2044 will cause multicast packet drops.
[ 859.229779] ib1: mtu > 2044 will cause multicast packet drops.
[ 859.233134] ADDRCONF(NETDEV_UP): bond3: link is not ready
[ 859.233153] bonding: bond3: link status definitely up for interface ib0.
[ 859.233156] bonding: bond3: making interface ib0 the new active one.
[ 859.233167] ib0: restarting multicast task
[ 859.233170] ib0: stopping multicast thread
[ 859.233172] ib0: adding multicast entry for mgid
0001:0000:0000:0000:0000:0000:0000:0000
[ 859.233175] ib0: starting multicast thread
[ 859.233178] bonding: bond3: first active interface up!
[ 859.233180] bonding: bond3: link status definitely up for interface ib1.
[ 859.233289] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[ 859.234904] ADDRCONF(NETDEV_CHANGE): bond3: link becomes ready
[ 859.234944] ib0: restarting multicast task
[ 859.234948] ib0: stopping multicast thread
[ 859.234951] ib0: adding multicast entry for mgid
ff12:601b:ffff:0000:0000:0001:ff00:f778
[ 859.234954] ib0: starting multicast thread
[ 859.235069] ib0: joining MGID ff12:601b:ffff:0000:0000:0001:ff00:f778
[ 859.235090] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[ 859.235095] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[ 859.235162] ib0: restarting multicast task
[ 859.235163] ib0: stopping multicast thread
[ 859.235166] ib0: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0000:00fb
[ 859.235168] ib0: starting multicast thread
[ 859.235200] ib0: join completion for
ff12:601b:ffff:0000:0000:0001:ff00:f778 (status 0)
[ 859.235304] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[ 859.235343] ib0: Created ah ffff88043a9b9440
[ 859.235347] ib0: MGID ff12:601b:ffff:0000:0000:0001:ff00:f778 AV
ffff88043a9b9440, LID 0xc002, SL 0
[ 859.235408] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[ 859.235412] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[ 859.235481] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[ 859.235592] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[ 859.235596] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[ 859.260028] ib0: setting up send only multicast group for
ff12:601b:ffff:0000:0000:0000:0000:0016
[ 859.260042] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting join
[ 859.260136] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 859.263792] ib0: setting up send only multicast group for
ff12:401b:ffff:0000:0000:0000:0000:0016
[ 859.263806] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[ 859.263883] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[ 860.600025] ib0: setting up send only multicast group for
ff12:601b:ffff:0000:0000:0000:0000:0002
[ 860.600035] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0002, starting join
[ 860.600149] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[ 863.230303] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[ 863.230406] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[ 863.230411] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[ 864.600035] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0002, starting join
[ 864.600124] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[ 868.600034] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0002, starting join
[ 868.600119] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0002, status -22
[ 868.620031] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[ 868.620112] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[ 869.100039] ib0: no multicast record for
ff12:601b:ffff:0000:0000:0000:0000:0016, starting join
[ 869.100124] ib0: multicast join failed for
ff12:601b:ffff:0000:0000:0000:0000:0016, status -22
[ 869.600029] bond3: no IPv6 routers present
[ 879.230231] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[ 879.230349] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[ 879.230355] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[ 886.919993] ib0: restarting multicast task
[ 886.919997] ib0: stopping multicast thread
[ 886.920002] ib0: adding multicast entry for mgid
ff12:401b:ffff:0000:0000:0000:0001:0001
[ 886.920005] ib0: starting multicast thread
[ 886.920140] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[ 886.920244] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[ 886.920248] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[ 886.934421] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[ 886.934520] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[ 889.000014] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[ 889.000102] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[ 899.053269] ib0: restarting multicast task
[ 899.053273] ib0: stopping multicast thread
[ 899.053277] ib0: deleting multicast group
ff12:401b:ffff:0000:0000:0000:0001:0001
[ 899.053280] ib0: deleting multicast group
ff12:401b:ffff:0000:0000:0000:0001:0001
[ 899.053285] ib0: starting multicast thread
[ 899.053430] ib0: joining MGID 0001:0000:0000:0000:0000:0000:0000:0000
[ 899.053540] ib0: join completion for
0001:0000:0000:0000:0000:0000:0000:0000 (status -22)
[ 899.053544] ib0: multicast join failed for
0001:0000:0000:0000:0000:0000:0000:0000, status -22
[ 899.073152] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[ 899.073241] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
[ 903.420017] ib0: no multicast record for
ff12:401b:ffff:0000:0000:0000:0000:0016, starting join
[ 903.420100] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:0016, status -22
Thank you,
Dennis P.
On Mon, Apr 20, 2009 at 8:17 AM, Or Gerlitz <ogerlitz at voltaire.com> wrote:
> Dennis Portello wrote:
> > Regular TCP/IP unicast works, though dmesg is full of warning about
> > multicast failing. Multicast does not work at all.
>
> Unicast IP relies on ARP and IPoIB ARPs use the broadcast multicast group,
> so
> IB multicast does work on your setup... to see what IB multicast groups are
> being
> joined by your IPoIB devices, you can use the ipoib debugfs entries
>
> $ mount -t debugfs none /sys/kernel/debug
> $ cat /sys/kernel/debug/ipoib/ibxxx_mcg
>
> see Documentation/infiniband/ipoib.txt for more info
>
> Or.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090420/54f7e5c9/attachment.html>
More information about the general
mailing list