[ofa-general] Re: multicast join failed for...

Hal Rosenstock halr at voltaire.com
Fri Apr 13 05:41:41 PDT 2007


On Wed, 2007-04-11 at 05:49, Michael S. Tsirkin wrote:
> > Quoting Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: multicast join failed for...
> > 
> > On Mon, 2007-04-09 at 18:47, Egor Tur wrote:
> > > Hi folk.
> > > 
> > > > > ib1: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 
> > > > > ib0: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 
> > > > >  
> > > > > And in osm.log: 
> > > > > Apr 09 21:33:50 658439 [42003960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, 
> > > > > __validate_port_caps, or JoinState = 0 failed from port 0x001708ffffd15099 (HP Lion Cub DDR 128MB), 
> > > > > sending IB_SA_MAD_STATUS_REQ_INVALID 
> > > > >  
> > > > OpenSM ERR 1B12 means that the rate or MTU of the port was incompatible
> > > > with the MC group. You could turn on -V with OpenSM and see more log
> > > > messages as to what is going on wrong from the SM's perspective.
> > > 
> > > Ok. This from osm.log with -V :
> > >  
> > > Apr 10 00:56:06 390007 [44007960] -> __osm_sa_mad_ctrl_process: [                                                              
> > > Apr 10 00:56:06 390016 [44007960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD         
> > > Apr 10 00:56:06 390027 [44007960] -> __osm_sa_mad_ctrl_process: ]                                                              
> > > Apr 10 00:56:06 390033 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ]                                                         
> > > Apr 10 00:56:06 390046 [41001960] -> osm_mcmr_rcv_process: [                                                                   
> > > Apr 10 00:56:06 390054 [41001960] -> __osm_mcmr_rcv_join_mgrp: [                                                               
> > > Apr 10 00:56:06 390060 [41001960] -> __osm_mcmr_rcv_join_mgrp: Dump of incoming record                                         
> > > Apr 10 00:56:06 390065 [41001960] -> MCMember Record dump:                                                                     
> > >                                 MGID....................0xff12601bffff0000 : 0x0000000000000001                                
> > >                                 PortGid.................0xfe80000000000000 : 0x001708ffffd1509a                                
> > >                                 qkey....................0xB1B                                                                  
> > >                                 mlid....................0x0                                                                    
> > >                                 mtu.....................0x84                                                                   
> > >                                 TClass..................0x0                                                                    
> > >                                 pkey....................0xFFFF                                                                 
> > >                                 rate....................0x83                                                                   
> > >                                 pkt_life................0x0                                                                    
> > >                                 SLFlowLabelHopLimit.....0x0                                                                    
> > >                                 ScopeState..............0x1                                                                    
> > >                                 ProxyJoin...............0x0                                                                    
> > > Apr 10 00:56:06 390084 [41001960] -> __validate_more_comp_fields: Requested RATE 6 is not equal to 3
> > 
> > Rate 6 is 20 Gb/sec whereas 3 is 10 Gb/sec. So the port is 4x DDR (rate
> > 6) and the group is 4x SDR. The request is for equal to the rate so it
> > fails.
> 
> 
> BTW, the only reason I know for IPoIB to request a specific rate
> is if the broadcast multicast group has that rate. Roland, is that right?
> 
> So, how come the broadcast multicast group has rate DDR, but a specific
> group has lower rate?

Why does this IPoIB client think that the broadcast group is 4x DDR (20
Gbps) when the SM thinks it is 4x SDR (10 Gbps) ? How could that happen
? Is this a porting issue somehow ? 

-- Hal




More information about the general mailing list