[openib-general] ipoib, ipv6 and multicast groups
Hal Rosenstock
halr at voltaire.com
Mon Jan 29 10:32:48 PST 2007
On Mon, 2007-01-29 at 13:17, chas williams - CONTRACTOR wrote:
> recently our sm started throwing the following errors:
>
> Jan 29 18:10:49 706710 [42003940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
> Jan 29 18:10:49 706721 [42003940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> Jan 29 18:10:51 345113 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
> Jan 29 18:10:51 345132 [42804940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> Jan 29 18:10:51 514312 [41802940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
> Jan 29 18:10:51 514320 [41802940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> Jan 29 18:10:51 735732 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 mlids are taken
32 is too low for MLID space support IMO.
> we tracked this down to a problem with ipoib interaction
> with ipv6. ipv6 joins two multicast groups, instead of
> just one like ipv4.
>
> # netstat -A inet6 -g -n
> ...
> IPv6/IPv4 Group Memberships
> Interface RefCnt Group
> --------------- ------ ---------------------
> lo 1 ff02::1
> ib0 1 ff02::1:ff00:77a2
> ib0 1 ff02::1
>
>
> # netstat -A inet6 -g -n
> ...
> IPv6/IPv4 Group Memberships
> Interface RefCnt Group
> --------------- ------ ---------------------
> lo 1 224.0.0.1
> ib0 1 224.0.0.1
>
>
> # cat /sys/kernel/debug/ipoib/ib0_mcg
> GID: ff12:401b:ffff:0:0:0:0:1
> created: 4298482097
> queuelen: 0
> complete: yes
> send_only: no
>
> GID: ff12:401b:ffff:0:0:0:ffff:ffff
> created: 4298482097
> queuelen: 0
> complete: yes
> send_only: no
>
> GID: ff12:601b:ffff:0:0:0:0:1
> created: 4298482097
> queuelen: 0
> complete: yes
> send_only: no
>
> GID: ff12:601b:ffff:0:0:1:ff00:77a2
> created: 4298482097
> queuelen: 0
> complete: yes
> send_only: no
>
>
> the ff02::1:ff00:77a2 group is specific to the interface (link local),
> so each of our ib hosts running ipv6 registers its own unique multicast
> group. since our network is bigger than 32 hosts, it appears that we
> have exceeded the multicast tables in our local switches and this is
> making opensm generate the above error.
>
> besides not running ipv6, are there any thoughts about this?
This has been discussed on the list before. Last time was a thread on
"IPv6 and IPoIB scalability issue" back in late November (11/30) to
early December (12/2). There are some options presented. None have been
pursued to the best of my knowledge.
-- Hal
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list