[ofa-general] Intermittent: ib0: multicast join failed

Hal Rosenstock hal.rosenstock at gmail.com
Mon Sep 22 11:50:03 PDT 2008


On Mon, Sep 22, 2008 at 2:43 PM, Roger Spellman <roger at terascala.com> wrote:
> Thanks, Hal.
>
> Below is the output to ibstat and ibstatus.  It shows that the rate is
> 2.5 Gb/sec, rather than 10 Gb/sec.
>
> Is there a way to get it to renegotiate the rate, short of rebooting?

Try ibportstate reset on the switch peer port. You could also replug
the cable on that link.

> [root at ts-raid6-03 lib64]# ibstat
> CA 'mthca0'
>        CA type: MT25204
>        Number of ports: 1
>        Firmware version: 1.2.936
>        Hardware version: a0
>        Node GUID: 0x0002c9020026e4c0
>        System image GUID: 0x0002c9020026e4c3
>        Port 1:
>                State: Active
>                Physical state: LinkUp
>                Rate: 2
>                Base lid: 19
>                LMC: 0
>                SM lid: 1
>                Capability mask: 0x02510a68
>                Port GUID: 0x0002c9020026e4c1
> [root at ts-raid6-03 lib64]# ibstatus
> Infiniband device 'mthca0' port 1 status:
>        default gid:     fe80:0000:0000:0000:0002:c902:0026:e4c1
>        base lid:        0x13
>        sm lid:          0x1
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            2.5 Gb/sec (1X)
>
>
>
>
>> It's likely a rate issue where the negotiated port rate is not the
>> broadcast group rate.

Yes, it's a rate problem (the link is coming up a 1X SDR which is 2.5
Gbps whereas I suspect that the group is 10 Gbps so it can't join.

-- Hal

>> What does ibstat or ibstatus show when the join fails ? Also, what
>> about saquery -g ?
>
>> >
>> > Rebooting the node that failed to join the group always seems to
> solve
>> > the problem.
>
>> Yes, that's consistent with the negotiated rate being a problem.
>
>> -- Hal
>



More information about the general mailing list