***SPAM*** Re: [ofa-general] Intermittent: ib0: multicast join failed

Hal Rosenstock hal.rosenstock at gmail.com
Fri Sep 19 11:33:08 PDT 2008


On Fri, Sep 19, 2008 at 2:28 PM, Roger Spellman <roger at terascala.com> wrote:
> Sasha,
> I am running OFED 1.3.1.
>
> My SN Manager is opensmd.  /var/log/opensm.log shows the following:
>
> Sep 19 14:21:19 480217 [43806960] 0x02 -> SUBNET UP
> Sep 19 14:21:19 818276 [41001960] 0x01 ->
> __osm_trap_rcv_process_request: Received Generic Notice type:0x04
> num:144 Producer:1 (Channel Adapter) from LID:0x0011
> TID:0x0000000000000000
> Sep 19 14:21:19 818330 [41001960] 0x02 -> osm_report_notice: Reporting
> Generic Notice type:4 num:144 from LID:0x0011
> GID:0xfe80000000000000,0x0002c9020027d451
> Sep 19 14:21:19 823408 [43806960] 0x02 -> osm_ucast_mgr_process: minhop
> tables configured on all switches
> Sep 19 14:21:19 827220 [43806960] 0x02 -> SUBNET UP
> Sep 19 14:21:27 283873 [41802960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
> 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState =
> 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending
> IB_SA_MAD_STATUS_REQ_INVALID
> Sep 19 14:21:43 298367 [42804960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
> 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState =
> 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending
> IB_SA_MAD_STATUS_REQ_INVALID
> Sep 19 14:21:59 312765 [42003960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
> 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState =
> 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending
> IB_SA_MAD_STATUS_REQ_INVALID

It's likely a rate issue where the negotiated port rate is not the
broadcast group rate.

What does ibstat or ibstatus show when the join fails ? Also, what
about saquery -g ?

>
> Rebooting the node that failed to join the group always seems to solve
> the problem.

Yes, that's consistent with the negotiated rate being a problem.

-- Hal

> Thanks for your help.
>
> -Roger
>
>> -----Original Message-----
>> From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
>> Sent: Friday, September 19, 2008 1:06 PM
>> To: Roger Spellman
>> Cc: general at lists.openfabrics.org
>> Subject: Re: [ofa-general] Intermittent: ib0: multicast join failed
>>
>> On 16:45 Thu 18 Sep     , Roger Spellman wrote:
>> > I have many nodes, each with a Mellanox MT25204.  When I reboot some
>> > nodes, they occasionally get the following error:
>> >
>> > ib0: multicast join failed
>>
>> What is the software stack? Which version?
>>
>> > Rebooting the system almost always solves this problem.
>> >
>> > What causes this?
>>
>> What are SM you using? If it is OpenSM you can see in the log
>> (/vat/log/opensm.log) why the join failed.
>>
>> > Is there a way to solve this without rebooting?
>>
>> Hard to say - the reason for failure is unknown. I could be port's low
>> speed/width or something else, hard to say without any details.
>>
>> Sasha
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list