[ofa-general] multicast join failed for...
Hal Rosenstock
halr at voltaire.com
Mon Apr 9 16:20:37 PDT 2007
On Mon, 2007-04-09 at 18:47, Egor Tur wrote:
> Hi folk.
>
> > > ib1: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22
> > > ib0: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22
> > >
> > > And in osm.log:
> > > Apr 09 21:33:50 658439 [42003960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields,
> > > __validate_port_caps, or JoinState = 0 failed from port 0x001708ffffd15099 (HP Lion Cub DDR 128MB),
> > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > >
> > OpenSM ERR 1B12 means that the rate or MTU of the port was incompatible
> > with the MC group. You could turn on -V with OpenSM and see more log
> > messages as to what is going on wrong from the SM's perspective.
>
> Ok. This from osm.log with -V :
>
> Apr 10 00:56:06 390007 [44007960] -> __osm_sa_mad_ctrl_process: [
> Apr 10 00:56:06 390016 [44007960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD
> Apr 10 00:56:06 390027 [44007960] -> __osm_sa_mad_ctrl_process: ]
> Apr 10 00:56:06 390033 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Apr 10 00:56:06 390046 [41001960] -> osm_mcmr_rcv_process: [
> Apr 10 00:56:06 390054 [41001960] -> __osm_mcmr_rcv_join_mgrp: [
> Apr 10 00:56:06 390060 [41001960] -> __osm_mcmr_rcv_join_mgrp: Dump of incoming record
> Apr 10 00:56:06 390065 [41001960] -> MCMember Record dump:
> MGID....................0xff12601bffff0000 : 0x0000000000000001
> PortGid.................0xfe80000000000000 : 0x001708ffffd1509a
> qkey....................0xB1B
> mlid....................0x0
> mtu.....................0x84
> TClass..................0x0
> pkey....................0xFFFF
> rate....................0x83
> pkt_life................0x0
> SLFlowLabelHopLimit.....0x0
> ScopeState..............0x1
> ProxyJoin...............0x0
> Apr 10 00:56:06 390084 [41001960] -> __validate_more_comp_fields: Requested RATE 6 is not equal to 3
Rate 6 is 20 Gb/sec whereas 3 is 10 Gb/sec. So the port is 4x DDR (rate
6) and the group is 4x SDR. The request is for equal to the rate so it
fails.
Are all your ports DDR or do you have a mix ? If all are DDR, you can
configure the default partition to use this rate.
> Apr 10 00:56:06 390090 [41001960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or
> JoinState = 0 failed from port 0x001708ffffd1509a (HP Lion Cub DDR 128MB), sending IB_SA_MAD_STATUS_REQ_INVALID
> Apr 10 00:56:07 921941 [44007960] -> __osm_sa_mad_ctrl_process: [
> Apr 10 00:56:07 921947 [44007960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD
> Apr 10 00:56:07 921955 [44007960] -> __osm_sa_mad_ctrl_process: ]
> Apr 10 00:56:07 921960 [42804960] -> osm_mcmr_rcv_process: [
> Apr 10 00:56:07 921961 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Apr 10 00:56:07 921978 [42804960] -> __osm_mcmr_rcv_join_mgrp: [
> Apr 10 00:56:07 921994 [42804960] -> __osm_mcmr_rcv_join_mgrp: Dump of incoming record
> Apr 10 00:56:07 922000 [42804960] -> MCMember Record dump:
> MGID....................0xff12601bffff0000 : 0x0000000000000001
> PortGid.................0xfe80000000000000 : 0x001708ffffd15099
> qkey....................0xB1B
> mlid....................0x0
> mtu.....................0x84
> TClass..................0x0
> pkey....................0xFFFF
> rate....................0x83
> pkt_life................0x0
> SLFlowLabelHopLimit.....0x0
> ScopeState..............0x1
> ProxyJoin...............0x0
> Apr 10 00:56:07 922013 [42804960] -> __validate_more_comp_fields: Requested RATE 6 is not equal to 3
Same is true for both ports.
> Apr 10 00:56:07 922019 [42804960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or
> JoinState = 0 failed from port 0x001708ffffd15099 (HP Lion Cub DDR 128MB), sending IB_SA_MAD_STATUS_REQ_INVALID
> >
> > What is the GUID for ib0 and ib1 ? That should tell us which port is
> > having the issue joing the group.
>
> # ibstat
> CA 'mthca0'
> CA type: MT25208 (MT23108 compat mode)
> Number of ports: 2
> Firmware version: 4.7.400
> Hardware version: a0
> Node GUID: 0x001708ffffd15098
> System image GUID: 0x001708ffffd1509b
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 20
> Base lid: 9
> LMC: 0
> SM lid: 1
> Capability mask: 0x02510a68
> Port GUID: 0x001708ffffd15099
> Port 2:
> State: Active
> Physical state: LinkUp
> Rate: 20
> Base lid: 11
> LMC: 0
> SM lid: 1
> Capability mask: 0x02510a68
> Port GUID: 0x001708ffffd1509a
>
> # ibstatus
> Infiniband device 'mthca0' port 1 status:
> default gid: fe80:0000:0000:0000:0017:08ff:ffd1:5099
> base lid: 0x9
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 20 Gb/sec (4X DDR)
>
> Infiniband device 'mthca0' port 2 status:
> default gid: fe80:0000:0000:0000:0017:08ff:ffd1:509a
> base lid: 0xb
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 20 Gb/sec (4X DDR)
>
> >
> > Are you using IPv6 ?
>
> I don't use IPv6 but kernel was compiled with support IPv6.
> Also when I have builded kernel modules for infiniband with --with-ipoib-cm then
> module ib_ipoib depends on ipv6.
> # modinfo ib_ipoib
> filename: /lib/modules/2.6.18/updates/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko
> author: Roland Dreier
> description: IP-over-InfiniBand net driver
> license: Dual BSD/GPL
> vermagic: 2.6.18 SMP mod_unload gcc-4.1
> depends: ib_cm,ipv6,ib_core,ib_sa,ib_sa,ib_core
>
> If modules was builded with --without-ipoib-cm then ib_ipoib don't depend on ipv6.
> But the messages remain the same in log.
>
> Thanx.
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list