[ofa-general] Intermittent: ib0: multicast join failed

Roger Spellman roger at terascala.com
Fri Sep 19 11:28:00 PDT 2008


Sasha,
I am running OFED 1.3.1.

My SN Manager is opensmd.  /var/log/opensm.log shows the following:

Sep 19 14:21:19 480217 [43806960] 0x02 -> SUBNET UP
Sep 19 14:21:19 818276 [41001960] 0x01 ->
__osm_trap_rcv_process_request: Received Generic Notice type:0x04
num:144 Producer:1 (Channel Adapter) from LID:0x0011
TID:0x0000000000000000
Sep 19 14:21:19 818330 [41001960] 0x02 -> osm_report_notice: Reporting
Generic Notice type:4 num:144 from LID:0x0011
GID:0xfe80000000000000,0x0002c9020027d451
Sep 19 14:21:19 823408 [43806960] 0x02 -> osm_ucast_mgr_process: minhop
tables configured on all switches
Sep 19 14:21:19 827220 [43806960] 0x02 -> SUBNET UP
Sep 19 14:21:27 283873 [41802960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState =
0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Sep 19 14:21:43 298367 [42804960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState =
0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Sep 19 14:21:59 312765 [42003960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR
1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState =
0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID

Rebooting the node that failed to join the group always seems to solve
the problem.

Thanks for your help.

-Roger

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> Sent: Friday, September 19, 2008 1:06 PM
> To: Roger Spellman
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] Intermittent: ib0: multicast join failed
> 
> On 16:45 Thu 18 Sep     , Roger Spellman wrote:
> > I have many nodes, each with a Mellanox MT25204.  When I reboot some
> > nodes, they occasionally get the following error:
> >
> > ib0: multicast join failed
> 
> What is the software stack? Which version?
> 
> > Rebooting the system almost always solves this problem.
> >
> > What causes this?
> 
> What are SM you using? If it is OpenSM you can see in the log
> (/vat/log/opensm.log) why the join failed.
> 
> > Is there a way to solve this without rebooting?
> 
> Hard to say - the reason for failure is unknown. I could be port's low
> speed/width or something else, hard to say without any details.
> 
> Sasha



More information about the general mailing list