[openib-general] ipoib, ipv6 and multicast groups

Hal Rosenstock halr at voltaire.com
Mon Jan 29 10:32:48 PST 2007


On Mon, 2007-01-29 at 13:17, chas williams - CONTRACTOR wrote:
> recently our sm started throwing the following errors:
> 
> Jan 29 18:10:49 706710 [42003940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
> Jan 29 18:10:49 706721 [42003940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> Jan 29 18:10:51 345113 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
> Jan 29 18:10:51 345132 [42804940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> Jan 29 18:10:51 514312 [41802940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
> Jan 29 18:10:51 514320 [41802940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> Jan 29 18:10:51 735732 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken

32 is too low for MLID space support IMO.

> we tracked this down to a problem with ipoib interaction
> with ipv6.  ipv6 joins two multicast groups, instead of 
> just one like ipv4.
> 
> 	# netstat -A inet6 -g  -n
> 	...
> 	IPv6/IPv4 Group Memberships
> 	Interface       RefCnt Group
> 	--------------- ------ ---------------------
> 	lo              1      ff02::1
> 	ib0             1      ff02::1:ff00:77a2
> 	ib0             1      ff02::1
> 
> 
> 	# netstat -A inet6 -g  -n
> 	...
> 	IPv6/IPv4 Group Memberships
> 	Interface       RefCnt Group
> 	--------------- ------ ---------------------
> 	lo              1      224.0.0.1
> 	ib0             1      224.0.0.1
> 
> 
> 	# cat /sys/kernel/debug/ipoib/ib0_mcg
> 	GID: ff12:401b:ffff:0:0:0:0:1
> 	  created: 4298482097
> 	  queuelen:         0
> 	  complete:       yes
> 	  send_only:       no
> 
> 	GID: ff12:401b:ffff:0:0:0:ffff:ffff
> 	  created: 4298482097
> 	  queuelen:         0
> 	  complete:       yes
> 	  send_only:       no
> 
> 	GID: ff12:601b:ffff:0:0:0:0:1
> 	  created: 4298482097
> 	  queuelen:         0
> 	  complete:       yes
> 	  send_only:       no
> 
> 	GID: ff12:601b:ffff:0:0:1:ff00:77a2
> 	  created: 4298482097
> 	  queuelen:         0
> 	  complete:       yes
> 	  send_only:       no
> 
> 
> the ff02::1:ff00:77a2 group is specific to the interface (link local),
> so each of our ib hosts running ipv6 registers its own unique multicast
> group.  since our network is bigger than 32 hosts, it appears that we
> have exceeded the multicast tables in our local switches and this is
> making opensm generate the above error.
> 
> besides not running ipv6, are there any thoughts about this?

This has been discussed on the list before. Last time was a thread on
"IPv6 and IPoIB scalability issue" back in late November (11/30) to
early December (12/2). There are some options presented. None have been
pursued to the best of my knowledge.

-- Hal

> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 





More information about the general mailing list