[openib-general] ipoib, ipv6 and multicast groups

chas williams - CONTRACTOR chas at cmf.nrl.navy.mil
Mon Jan 29 10:17:22 PST 2007


recently our sm started throwing the following errors:

Jan 29 18:10:49 706710 [42003940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
Jan 29 18:10:49 706721 [42003940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
Jan 29 18:10:51 345113 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
Jan 29 18:10:51 345132 [42804940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
Jan 29 18:10:51 514312 [41802940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
Jan 29 18:10:51 514320 [41802940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
Jan 29 18:10:51 735732 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken

we tracked this down to a problem with ipoib interaction
with ipv6.  ipv6 joins two multicast groups, instead of 
just one like ipv4.

	# netstat -A inet6 -g  -n
	...
	IPv6/IPv4 Group Memberships
	Interface       RefCnt Group
	--------------- ------ ---------------------
	lo              1      ff02::1
	ib0             1      ff02::1:ff00:77a2
	ib0             1      ff02::1


	# netstat -A inet6 -g  -n
	...
	IPv6/IPv4 Group Memberships
	Interface       RefCnt Group
	--------------- ------ ---------------------
	lo              1      224.0.0.1
	ib0             1      224.0.0.1


	# cat /sys/kernel/debug/ipoib/ib0_mcg
	GID: ff12:401b:ffff:0:0:0:0:1
	  created: 4298482097
	  queuelen:         0
	  complete:       yes
	  send_only:       no

	GID: ff12:401b:ffff:0:0:0:ffff:ffff
	  created: 4298482097
	  queuelen:         0
	  complete:       yes
	  send_only:       no

	GID: ff12:601b:ffff:0:0:0:0:1
	  created: 4298482097
	  queuelen:         0
	  complete:       yes
	  send_only:       no

	GID: ff12:601b:ffff:0:0:1:ff00:77a2
	  created: 4298482097
	  queuelen:         0
	  complete:       yes
	  send_only:       no


the ff02::1:ff00:77a2 group is specific to the interface (link local),
so each of our ib hosts running ipv6 registers its own unique multicast
group.  since our network is bigger than 32 hosts, it appears that we
have exceeded the multicast tables in our local switches and this is
making opensm generate the above error.

besides not running ipv6, are there any thoughts about this?




More information about the general mailing list