[ofa-general] multicast join failed for...
Sasha Khapyorsky
sashak at voltaire.com
Mon Apr 9 16:28:53 PDT 2007
Hi Egor,
On Tue, 2007-04-10 at 01:47 +0300, Egor Tur wrote:
> Hi folk.
>
> > > ib1: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22
> > > ib0: multicast join failed for ff12:601b:ffff:0000:0000:0000:0000:0001, status -22
> > >
> > > And in osm.log:
> > > Apr 09 21:33:50 658439 [42003960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields,
> > > __validate_port_caps, or JoinState = 0 failed from port 0x001708ffffd15099 (HP Lion Cub DDR 128MB),
> > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > >
> > OpenSM ERR 1B12 means that the rate or MTU of the port was incompatible
> > with the MC group. You could turn on -V with OpenSM and see more log
> > messages as to what is going on wrong from the SM's perspective.
>
> Ok. This from osm.log with -V :
>
> Apr 10 00:56:06 390007 [44007960] -> __osm_sa_mad_ctrl_process: [
> Apr 10 00:56:06 390016 [44007960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD
> Apr 10 00:56:06 390027 [44007960] -> __osm_sa_mad_ctrl_process: ]
> Apr 10 00:56:06 390033 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Apr 10 00:56:06 390046 [41001960] -> osm_mcmr_rcv_process: [
> Apr 10 00:56:06 390054 [41001960] -> __osm_mcmr_rcv_join_mgrp: [
> Apr 10 00:56:06 390060 [41001960] -> __osm_mcmr_rcv_join_mgrp: Dump of incoming record
> Apr 10 00:56:06 390065 [41001960] -> MCMember Record dump:
> MGID....................0xff12601bffff0000 : 0x0000000000000001
> PortGid.................0xfe80000000000000 : 0x001708ffffd1509a
> qkey....................0xB1B
> mlid....................0x0
> mtu.....................0x84
> TClass..................0x0
> pkey....................0xFFFF
> rate....................0x83
> pkt_life................0x0
> SLFlowLabelHopLimit.....0x0
> ScopeState..............0x1
> ProxyJoin...............0x0
> Apr 10 00:56:06 390084 [41001960] -> __validate_more_comp_fields: Requested RATE 6 is not equal to 3
This multicast group was created with rate 6 (20Gb/s) and the port
trying to join has rate 3 (10Gb/s) and exact rate matching is requested.
This is the reason for failure.
Sasha
> Apr 10 00:56:06 390090 [41001960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or
> JoinState = 0 failed from port 0x001708ffffd1509a (HP Lion Cub DDR 128MB), sending IB_SA_MAD_STATUS_REQ_INVALID
> Apr 10 00:56:07 921941 [44007960] -> __osm_sa_mad_ctrl_process: [
> Apr 10 00:56:07 921947 [44007960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD
> Apr 10 00:56:07 921955 [44007960] -> __osm_sa_mad_ctrl_process: ]
> Apr 10 00:56:07 921960 [42804960] -> osm_mcmr_rcv_process: [
> Apr 10 00:56:07 921961 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Apr 10 00:56:07 921978 [42804960] -> __osm_mcmr_rcv_join_mgrp: [
> Apr 10 00:56:07 921994 [42804960] -> __osm_mcmr_rcv_join_mgrp: Dump of incoming record
> Apr 10 00:56:07 922000 [42804960] -> MCMember Record dump:
> MGID....................0xff12601bffff0000 : 0x0000000000000001
> PortGid.................0xfe80000000000000 : 0x001708ffffd15099
> qkey....................0xB1B
> mlid....................0x0
> mtu.....................0x84
> TClass..................0x0
> pkey....................0xFFFF
> rate....................0x83
> pkt_life................0x0
> SLFlowLabelHopLimit.....0x0
> ScopeState..............0x1
> ProxyJoin...............0x0
> Apr 10 00:56:07 922013 [42804960] -> __validate_more_comp_fields: Requested RATE 6 is not equal to 3
> Apr 10 00:56:07 922019 [42804960] -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or
> JoinState = 0 failed from port 0x001708ffffd15099 (HP Lion Cub DDR 128MB), sending IB_SA_MAD_STATUS_REQ_INVALID
> >
> > What is the GUID for ib0 and ib1 ? That should tell us which port is
> > having the issue joing the group.
>
> # ibstat
> CA 'mthca0'
> CA type: MT25208 (MT23108 compat mode)
> Number of ports: 2
> Firmware version: 4.7.400
> Hardware version: a0
> Node GUID: 0x001708ffffd15098
> System image GUID: 0x001708ffffd1509b
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 20
> Base lid: 9
> LMC: 0
> SM lid: 1
> Capability mask: 0x02510a68
> Port GUID: 0x001708ffffd15099
> Port 2:
> State: Active
> Physical state: LinkUp
> Rate: 20
> Base lid: 11
> LMC: 0
> SM lid: 1
> Capability mask: 0x02510a68
> Port GUID: 0x001708ffffd1509a
>
> # ibstatus
> Infiniband device 'mthca0' port 1 status:
> default gid: fe80:0000:0000:0000:0017:08ff:ffd1:5099
> base lid: 0x9
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 20 Gb/sec (4X DDR)
>
> Infiniband device 'mthca0' port 2 status:
> default gid: fe80:0000:0000:0000:0017:08ff:ffd1:509a
> base lid: 0xb
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 20 Gb/sec (4X DDR)
>
> >
> > Are you using IPv6 ?
>
> I don't use IPv6 but kernel was compiled with support IPv6.
> Also when I have builded kernel modules for infiniband with --with-ipoib-cm then
> module ib_ipoib depends on ipv6.
> # modinfo ib_ipoib
> filename: /lib/modules/2.6.18/updates/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko
> author: Roland Dreier
> description: IP-over-InfiniBand net driver
> license: Dual BSD/GPL
> vermagic: 2.6.18 SMP mod_unload gcc-4.1
> depends: ib_cm,ipv6,ib_core,ib_sa,ib_sa,ib_core
>
> If modules was builded with --without-ipoib-cm then ib_ipoib don't depend on ipv6.
> But the messages remain the same in log.
>
> Thanx.
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list