[openib-general] ipoib, ipv6 and multicast groups
chas williams - CONTRACTOR
chas at cmf.nrl.navy.mil
Mon Jan 29 10:17:22 PST 2007
recently our sm started throwing the following errors:
Jan 29 18:10:49 706710 [42003940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
Jan 29 18:10:49 706721 [42003940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
Jan 29 18:10:51 345113 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
Jan 29 18:10:51 345132 [42804940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
Jan 29 18:10:51 514312 [41802940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
Jan 29 18:10:51 514320 [41802940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
Jan 29 18:10:51 735732 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
we tracked this down to a problem with ipoib interaction
with ipv6. ipv6 joins two multicast groups, instead of
just one like ipv4.
# netstat -A inet6 -g -n
...
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 ff02::1
ib0 1 ff02::1:ff00:77a2
ib0 1 ff02::1
# netstat -A inet6 -g -n
...
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 224.0.0.1
ib0 1 224.0.0.1
# cat /sys/kernel/debug/ipoib/ib0_mcg
GID: ff12:401b:ffff:0:0:0:0:1
created: 4298482097
queuelen: 0
complete: yes
send_only: no
GID: ff12:401b:ffff:0:0:0:ffff:ffff
created: 4298482097
queuelen: 0
complete: yes
send_only: no
GID: ff12:601b:ffff:0:0:0:0:1
created: 4298482097
queuelen: 0
complete: yes
send_only: no
GID: ff12:601b:ffff:0:0:1:ff00:77a2
created: 4298482097
queuelen: 0
complete: yes
send_only: no
the ff02::1:ff00:77a2 group is specific to the interface (link local),
so each of our ib hosts running ipv6 registers its own unique multicast
group. since our network is bigger than 32 hosts, it appears that we
have exceeded the multicast tables in our local switches and this is
making opensm generate the above error.
besides not running ipv6, are there any thoughts about this?
More information about the general
mailing list