***SPAM*** Re: [ofa-general] opensm: bad multicast forwarding table entries

Hal Rosenstock hal.rosenstock at gmail.com
Wed Nov 12 14:46:57 PST 2008


On Wed, Nov 12, 2008 at 5:18 PM,  <akepner at sgi.com> wrote:
>
> Here's a description of a problem we're seeing where multicast
> forwarding tables are apparently getting set up incorrectly. I'd
> appreciate any debug help from the opensm experts out there.
>
> On large clusters (>1000 nodes or so) we often see hundreds of errors
> from 'ibdiagnet -r' like the following (this is the simplest example
> I could find):
>
> -I- Multicast Group:0xC069 has:2 switches and:2 HCAs
> -E- Disconnected switch:S0800690000002e51/U1 in group:0xC069
> -E- Disconnected HCA:r4i2n10/U1
>
> These have invariably been multicast groups associated with IPv6
> solicited node multicast addresses, e.g., in this case 'saquery -m'
> shows only a single member, "r5lead":
>
> MCMemberRecord member dump:
>                MGID....................0xff12601bffff0000 : 0x00000001ff26d289
>                Mlid....................0xC069
>                PortGid.................0xfe80000000000000 : 0x0002c9020026d289
>                ScopeState..............0x1
>                ProxyJoin...............0x0
>                NodeDescription.........r5lead HCA-1
>
> ibdiagnet shows that "r5lead" is connected to the switch with lid
> 1609, port 24:
>
> Switch  24 "S-0800690000002db4"         # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1609 lmc 0
> [24]    "H-0002c9020026d288"[1](2c9020026d289)          # "r5lead HCA-1" lid 1576 4xDDR
>
> and the multicast forwarding table (from 'dump_mfts.sh') is consistent:
>
> Multicast mlids [0xc000-0xc3ff] of switch Lid 1609 guid 0x0800690000002db4 (MT47396 Infiniscale-III Mellanox Technologies):
>            0                   1                   2
>     Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
>  MLid
> ....
> 0xc069                                                      x
>
>
> So far, so good. But we also have r4i2n10, connected to the switch with
> lid 1533 port 7:
>
> switchguid=0x800690000002e50(800690000002e50)
> Switch  24 "S-0800690000002e50"         # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1533 lmc 0
> ......
> [7]     "H-003048c2438a0000"[1](3048c2438a0001)                 # "r4i2n10 HCA-1" lid 771 4xDDR
>
> with this mft entry:
>
> Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 (MT47396 Infiniscale-III Mellanox Technologies):
>            0                   1                   2
>     Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
>  MLid
> .....
> 0xc069                    x
>
> Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a
> mft entry for the multicast group with MGID ff12601bffff::1ff26d289?

Are you using the consolidate IPv6 SNM (solicited node multicast)
option in OpenSM ?

-- Hal

> Anyone else seen similar?
>
> --
> Arthur
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list