[ofa-general] multicast join failed for...

Hal Rosenstock halr at voltaire.com
Tue Apr 10 06:03:32 PDT 2007


Hi again Egor,

On Mon, 2007-04-09 at 19:20, Hal Rosenstock wrote:
> On Mon, 2007-04-09 at 18:47, Egor Tur wrote:
> > Hi folk.
> > 
> > > > ib1: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 
> > > > ib0: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 
> > > >  
> > > > And in osm.log: 
> > > > Apr 09 21:33:50 658439 [42003960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, 
> > > > __validate_port_caps, or JoinState = 0 failed from port 0x001708ffffd15099 (HP Lion Cub DDR 128MB), 
> > > > sending IB_SA_MAD_STATUS_REQ_INVALID 
> > > >  
> > > OpenSM ERR 1B12 means that the rate or MTU of the port was incompatible
> > > with the MC group. You could turn on -V with OpenSM and see more log
> > > messages as to what is going on wrong from the SM's perspective.
> > 
> > Ok. This from osm.log with -V :
> >  
> > Apr 10 00:56:06 390007 [44007960] -> __osm_sa_mad_ctrl_process: [                                                              
> > Apr 10 00:56:06 390016 [44007960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD         
> > Apr 10 00:56:06 390027 [44007960] -> __osm_sa_mad_ctrl_process: ]                                                              
> > Apr 10 00:56:06 390033 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ]                                                         
> > Apr 10 00:56:06 390046 [41001960] -> osm_mcmr_rcv_process: [                                                                   
> > Apr 10 00:56:06 390054 [41001960] -> __osm_mcmr_rcv_join_mgrp: [                                                               
> > Apr 10 00:56:06 390060 [41001960] -> __osm_mcmr_rcv_join_mgrp: Dump of incoming record                                         
> > Apr 10 00:56:06 390065 [41001960] -> MCMember Record dump:                                                                     
> >                                 MGID....................0xff12601bffff0000 : 0x0000000000000001                                
> >                                 PortGid.................0xfe80000000000000 : 0x001708ffffd1509a                                
> >                                 qkey....................0xB1B                                                                  
> >                                 mlid....................0x0                                                                    
> >                                 mtu.....................0x84                                                                   
> >                                 TClass..................0x0                                                                    
> >                                 pkey....................0xFFFF                                                                 
> >                                 rate....................0x83                                                                   
> >                                 pkt_life................0x0                                                                    
> >                                 SLFlowLabelHopLimit.....0x0                                                                    
> >                                 ScopeState..............0x1                                                                    
> >                                 ProxyJoin...............0x0                                                                    
> > Apr 10 00:56:06 390084 [41001960] -> __validate_more_comp_fields: Requested RATE 6 is not equal to 3
> 
> Rate 6 is 20 Gb/sec whereas 3 is 10 Gb/sec. So the port is 4x DDR (rate
> 6) and the group is 4x SDR. The request is for equal to the rate so it
> fails.
> 
> Are all your ports DDR or do you have a mix ? If all are DDR, you can
> configure the default partition to use this rate.

To elaborate a little more on this, the configuration would be done via
/etc/osm-partitions.conf file with a single line as follows:

Default=0x7fff,ipoib,rate=6:ALL=full;
                           
> > Apr 10 00:56:06 390090 [41001960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or 
> > JoinState = 0 failed from port 0x001708ffffd1509a (HP Lion Cub DDR 128MB), sending IB_SA_MAD_STATUS_REQ_INVALID                
> > Apr 10 00:56:07 921941 [44007960] -> __osm_sa_mad_ctrl_process: [                                                              
> > Apr 10 00:56:07 921947 [44007960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD         
> > Apr 10 00:56:07 921955 [44007960] -> __osm_sa_mad_ctrl_process: ]                                                              
> > Apr 10 00:56:07 921960 [42804960] -> osm_mcmr_rcv_process: [                                                                   
> > Apr 10 00:56:07 921961 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ]                                                         
> > Apr 10 00:56:07 921978 [42804960] -> __osm_mcmr_rcv_join_mgrp: [                                                               
> > Apr 10 00:56:07 921994 [42804960] -> __osm_mcmr_rcv_join_mgrp: Dump of incoming record                                         
> > Apr 10 00:56:07 922000 [42804960] -> MCMember Record dump:                                                                     
> >                                 MGID....................0xff12601bffff0000 : 0x0000000000000001                                
> >                                 PortGid.................0xfe80000000000000 : 0x001708ffffd15099                                
> >                                 qkey....................0xB1B                                                                  
> >                                 mlid....................0x0                                                                    
> >                                 mtu.....................0x84                                                                   
> >                                 TClass..................0x0                                                                    
> >                                 pkey....................0xFFFF                                                                 
> >                                 rate....................0x83                                                                   
> >                                 pkt_life................0x0                                                                    
> >                                 SLFlowLabelHopLimit.....0x0                                                                    
> >                                 ScopeState..............0x1                                                                    
> >                                 ProxyJoin...............0x0                                                                    
> > Apr 10 00:56:07 922013 [42804960] -> __validate_more_comp_fields: Requested RATE 6 is not equal to 3  
> 
> Same is true for both ports.
> 
> > Apr 10 00:56:07 922019 [42804960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or 
> > JoinState = 0 failed from port 0x001708ffffd15099 (HP Lion Cub DDR 128MB), sending IB_SA_MAD_STATUS_REQ_INVALID                
> > > 
> > > What is the GUID for ib0 and ib1 ? That should tell us which port is
> > > having the issue joing the group.
> > 
> > # ibstat   
> > CA 'mthca0'
> >         CA type: MT25208 (MT23108 compat mode)
> >         Number of ports: 2
> >         Firmware version: 4.7.400
> >         Hardware version: a0
> >         Node GUID: 0x001708ffffd15098
> >         System image GUID: 0x001708ffffd1509b
> >         Port 1:
> >                 State: Active
> >                 Physical state: LinkUp
> >                 Rate: 20
> >                 Base lid: 9
> >                 LMC: 0
> >                 SM lid: 1
> >                 Capability mask: 0x02510a68
> >                 Port GUID: 0x001708ffffd15099
> >         Port 2:
> >                 State: Active
> >                 Physical state: LinkUp
> >                 Rate: 20
> >                 Base lid: 11
> >                 LMC: 0
> >                 SM lid: 1
> >                 Capability mask: 0x02510a68
> >                 Port GUID: 0x001708ffffd1509a
> > 
> > # ibstatus 
> > Infiniband device 'mthca0' port 1 status:
> >         default gid:     fe80:0000:0000:0000:0017:08ff:ffd1:5099
> >         base lid:        0x9
> >         sm lid:          0x1
> >         state:           4: ACTIVE
> >         phys state:      5: LinkUp
> >         rate:            20 Gb/sec (4X DDR)
> > 
> > Infiniband device 'mthca0' port 2 status:
> >         default gid:     fe80:0000:0000:0000:0017:08ff:ffd1:509a
> >         base lid:        0xb
> >         sm lid:          0x1
> >         state:           4: ACTIVE
> >         phys state:      5: LinkUp
> >         rate:            20 Gb/sec (4X DDR)
> > 
> > > 
> > > Are you using IPv6 ?
> > 
> > I don't use IPv6 but kernel was compiled with support IPv6.
> > Also when I have builded kernel modules for infiniband with --with-ipoib-cm then
> > module ib_ipoib depends on ipv6.
> > # modinfo ib_ipoib
> > filename:       /lib/modules/2.6.18/updates/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko
> > author:         Roland Dreier
> > description:    IP-over-InfiniBand net driver
> > license:        Dual BSD/GPL
> > vermagic:       2.6.18 SMP mod_unload gcc-4.1
> > depends:        ib_cm,ipv6,ib_core,ib_sa,ib_sa,ib_core
> > 
> > If modules was builded with --without-ipoib-cm then ib_ipoib don't depend on ipv6.
> > But the messages remain the same in log.

Are you using IPoIB (for IPv4) ? If so, is that working ?

-- Hal

> > 
> > Thanx.
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list