[ofw] OpenSM 3.3.11 and osmtest interaction.

Hal Rosenstock hal.rosenstock at gmail.com
Thu Sep 29 05:48:41 PDT 2011


Hi Stan,
On Wed, Sep 28, 2011 at 7:26 PM, Smith, Stan <stan.smith at intel.com> wrote:

> Hello,
>  In porting Opensm 3.3.11 to Windows the following MC osmtest failure
> occurred.
>
> 'osmtest -f m -M1' kept failing [ERR 0210] due to opensm 3.3.11 failing the
> MC group create as PKey == 0.
>
> Specifically the ib_pkey_is_invalid() call @ line 1026 in osm_sa_mcmember.c
> returned TRUE?
> Turns out the opensm  p_recvd_mcmember_rec->pkey == 0, as it was set in
> osmtest.
>  [osmt_multicast.c in the call to osmt_init_mc_memory() @ line 1427].
>
> The Windows fix was to 'mc_req_rec.pkey = IB_DEFAULT_PKEY' prior to calling
> osmt_send_mcast_request().
> The fix needed to be applied in a few places;  now all osmtests are
> passing.
>
> Thoughts on the failures?
>


commit f7f1ead1b4e9bba741a0d1312513839504cab1e3 introduced an additional pkey
check into osm_sa_mcmember_record.c:mcmr_rcv_join_mgrp


subsequent commit ffdcdec8a6557088b23e273c5d605465501d2d24 fixed only
some of the pkeys in the multicast flow of osmt_multicast.c
I don't have a good explanation for why only some of the cases were
changed/fixed. I'm sure I ran the multicast flow.


>
> Sean Hefty did a OFED for Linux test using head of the opensm src tree:
>
> osmtest -f m -M1
>
> Sep 28 15:43:19 736239 [6E1F3700] 0x02 -> osmt_run_mcast_flow: Checking
> Create given MGID=0 valid Set several options :
>                First above min RATE, Second less than max RATE
>                Third above min MTU, Fourth less than max MTU
>                Fifth exact MTU & RATE feasible, Sixth exact RATE feasible
>                Seventh exact MTU feasible (o15.0.1.4)...
> Sep 28 15:43:19 737661 [6E1F3700] 0x02 -> osmt_run_mcast_flow: Validating
> resulting MGID (o15.0.1.5)...
> Sep 28 15:43:19 737720 [6E1F3700] 0x02 -> osmt_run_mcast_flow: Checking
> Create given MGID=0 (o15.0.1.4)...
> Sep 28 15:43:19 738032 [6D9F0710] 0x01 -> __osmv_sa_mad_rcv_cb: ERR 5501:
> Remote error:0x0200
> Sep 28 15:43:19 738054 [6D9F0710] 0x01 -> osmtest_query_res_cb: ERR 0003:
> Error on query (IB_REMOTE_ERROR)
> Sep 28 15:43:19 738082 [6E1F3700] 0x01 -> osmt_send_mcast_request: ERR
> 0224: ib_query failed (IB_REMOTE_ERROR)
> Sep 28 15:43:19 738110 [6E1F3700] 0x01 -> osmt_send_mcast_request: Remote
> error = IB_SA_MAD_STATUS_REQ_INVALID
> Sep 28 15:43:19 738134 [6E1F3700] 0x01 -> osmt_run_mcast_flow: ERR 0210:
> Failed to create MCG for MGID=0 - got
> IB_REMOTE_ERROR/IB_SA_MAD_STATUS_REQ_INVALID
> Sep 28 15:43:19 738162 [6E1F3700] 0x01 -> osmtest_run: ERR 0152: Multicast
> Flow failed: (IB_REMOTE_ERROR)
> OSMTEST: TEST "Multicast" FAIL
>

Yes, I see the same thing.


>
> Not a patch, only reference points to what I did to fix the issue in
> Windows.
>
> --- F:/OSM/opensm-3.3.11/osmtest/osmt_multicast.c       Wed Sep 28 16:16:25
> 2011
> +++
> F:/openIB-windows-svn/latest/gen1/trunk/ulp/opensm/userX/osmtest/osmt_multicast.c
>   Wed Sep 28 14:23:57 2011
> @@ -768,8 +768,8 @@
>            IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE;
>
>        OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, EXPECTING_ERRORS_START "\n");
> -       status = osmt_send_mcast_request(p_osmt, 1, &mc_req_rec, comp_mask,
> -                                        sa_mad);
> +
> +       status = osmt_send_mcast_request(p_osmt, 1, &mc_req_rec, comp_mask,
> sa_mad);
>        OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, EXPECTING_ERRORS_END "\n");
>
>        if (((ib_net16_t) (sa_mad->status & IB_SMP_STATUS_MASK)) !=
> @@ -1429,6 +1429,7 @@
>        /* no MGID */
>        memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t));
>        /* Request Join */
> +       mc_req_rec.pkey = IB_DEFAULT_PKEY;
>        ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER);
>
>        mc_req_rec.pkt_life = 0 | IB_PATH_SELECTOR_GREATER_THAN << 6;
> @@ -1455,6 +1456,7 @@
>        /* o15.0.1.6: */
>        /* - Create a new MCG with valid requested MGID. */
>        osmt_init_mc_query_rec(p_osmt, &mc_req_rec);
> +       mc_req_rec.pkey = IB_DEFAULT_PKEY;
>        mc_req_rec.mgid = good_mgid;
>
>        OSM_LOG(&p_osmt->log, OSM_LOG_INFO,
> @@ -2221,6 +2223,7 @@
>                "\t\twith unrealistic MTU greater than 4096
> (o15.0.1.8)...\n");
>
>        /* First create new mgrp */
> +       mc_req_rec.pkey = IB_DEFAULT_PKEY;
>        ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER);
>        mc_req_rec.mtu = IB_MTU_LEN_1024 | IB_PATH_SELECTOR_EXACTLY << 6;
>        memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t));
> @@ -2308,6 +2311,7 @@
>        }
>
>        if (remote_port_guid != 0x0) {
> +               mc_req_rec.pkey = IB_DEFAULT_PKEY;
>                ib_member_set_join_state(&mc_req_rec,
>                                         IB_MC_REC_STATE_FULL_MEMBER);
>                memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t));
>
>
> Thanks,
>

Thanks; patch to follow shortly.

-- Hal


>
> Stan.
>
>
>
>
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20110929/c1cbcb84/attachment.html>


More information about the ofw mailing list