<div dir="ltr"><div>The option I mentioned is only in MLNX OpenSM (and not in upstream OpenSM) so is a noop on upstream OpenSM. It was intended to bypass the routine that was indicating the failure (<font color="#500050">osm_hm_set_by_physp</font>).</div><div><br></div><div>Anyhow, glad this got you past the roadblock.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 15, 2018 at 5:17 PM, Tim Miller <span dir="ltr"><<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Hal,<br>
<br>
Thanks - oddly enough, when I switched to ibsim 0.7 compiled from git and opensm from the latest vanilla <a href="http://openfabrics.org" target="_blank" rel="noreferrer">openfabrics.org</a> release, everything seems to work fine (opensm reports "SUBNET UP", and the simulated switches appear to have valid LFTs). Could this default have changed between the two versions of opensm?<br>
<br>
Regards,<br>
Tim<span><br>
<br>
On 02/15/2018 05:13 PM, Hal Rosenstock wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span>
In opensm.conf, try setting<br>
<br>
hm_unhealthy_ports_checks FALSE<br>
<br>
it defaults to TRUE<br>
<br>
<br></span><div><div class="h5">
On Thu, Feb 15, 2018 at 4:22 PM, Tim Miller <<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a> <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><wbr>>> wrote:<br>
<br>
    Hi Hal,<br>
<br>
    I'd have to see about sending you the whole netdiscover file. I<br>
    could probably strip down the topology for debugging, as our We're<br>
    running MOFED 4.2-1.0.0.<br>
<br>
    Here's a result of the a dump of that switch GUID from the<br>
    simulator console - it appears that it's considering port 0 to be<br>
    the SMA port:<br>
<br>
    sim> dump "S-7cfe900300b49890"<br>
    # Net status - Thu Feb 15 16:04:23 2018<br>
<br>
    Switch 36 "S-7cfe900300b49890"  nodeguid 7cfe900300b49890<br>
    sysimgguid 7cfe900300b49890<br>
    #       linearcap 30720 FDBtop 0 portchange 1<br>
    7cfe900300b49890        [0]     "Sma Port"[0]    lid 1309 lmc 0<br>
    smlid 0  4x  2.5G Active/LinkUp<br>
    7cfe900300b49890        [1]     "H-24be05ffffa8e4c0"[1]   4x FDR10<br>
    Init/LinkUp<br>
    7cfe900300b49890        [2]     "H-24be05ffffa8f410"[1]   4x FDR10<br>
    Init/LinkUp<br>
    7cfe900300b49890        [3]     "H-24be05ffffa82580"[1]   4x FDR10<br>
    Init/LinkUp<br>
    ...<br>
    7cfe900300b49890        [36]    "H-24be05ffffa85500"[1]   4x FDR10<br>
    Init/LinkUp<br>
    #  dumped 1 nodes<br>
<br>
    Regards,<br>
    Tim<br>
<br>
    On 02/15/2018 04:01 PM, Hal Rosenstock wrote:<br>
<br>
        Are you sure you're using MLNX ibsim with MLNX OpenSM ? It<br>
        looks like MLNX ibsim supports 0xff17 to me so the message<br>
        "process_packet: no one to handle pkt: class 0x81, attr<br>
        0xff17" shouldn't come out.<br>
<br>
        Can you send me your ibnetdiscover file that is used as input<br>
        to ibsim ? Maybe the real problem is:<br>
        osm_hm_set_by_physp: Remote port of 0x7cfe900300b49890[0]<br>
        couldn't be found<br>
        That looks like remote port to some switch port 0 which looks<br>
        odd to me as switch port 0 has no peer port and it shouldn't<br>
        be looking for one. Which MLNX OpenSM version ?<br>
<br>
<br>
        On Thu, Feb 15, 2018 at 3:50 PM, Tim Miller<br>
        <<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a> <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><wbr>><br></div></div><span>
        <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><br>
        <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><wbr>>>> wrote:<br>
<br></span><div><div class="h5">
            Hi Hal,<br>
<br>
            Thanks for looking into this. You're indeed correct that<br>
        I'm using<br>
            an MLNX OFED ibsim (and opensm for that matter). I could try<br>
            running both from a vanilla OpenFabrics release and see if<br>
        I have<br>
            any better luck; let me go ahead and try that...<br>
<br>
            Regards,<br>
            Tim<br>
<br>
            On 02/15/2018 03:39 PM, Hal Rosenstock wrote:<br>
<br>
                Just checked. MLNX OFED ibsim supports 0xff17 attribute.<br>
<br>
                On Thu, Feb 15, 2018 at 3:36 PM, Hal Rosenstock<br>
                <<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.com</a><br>
        <mailto:<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.c<wbr>om</a>><br>
        <mailto:<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.c<wbr>om</a><br>
        <mailto:<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.c<wbr>om</a>>><br>
                <mailto:<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.c<wbr>om</a><br>
        <mailto:<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.c<wbr>om</a>><br>
                <mailto:<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.c<wbr>om</a><br>
        <mailto:<a href="mailto:hal.rosenstock@gmail.com" target="_blank">hal.rosenstock@gmail.c<wbr>om</a>>>>> wrote:<br>
<br>
                    Hi Tim,<br>
<br>
                    Attribute ID 0xff17 is in the vendor specific<br>
        range for SM<br>
                    attributes and not supported with (at least) the<br>
        upstream<br>
                ibsim.<br>
<br>
                    I think you are using MLNX OpenSM rather than<br>
        upstream or OFED<br>
                    OpenSM with the upstream ibsim. I'm not sure if<br>
        MLNX ibsim<br>
                    supports the additional vendor specific SM<br>
        attributes or not.<br>
<br>
                    Can you work with some upstream or OFED OpenSM or only<br>
                MLNX OpenSM<br>
                    ? If not, I try to find out whether using the MLNX<br>
        OFED ibsim<br>
                    supports the additional attributes for running<br>
        MLNX OpenSM.<br>
<br>
                    -- Hal<br>
<br>
<br>
                    On Thu, Feb 15, 2018 at 11:50 AM, Tim Miller<br>
                    <<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><br>
        <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><wbr>> <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><br>
        <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><wbr>>><br>
                <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><br>
        <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><wbr>><br>
<br>
                <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><br>
        <mailto:<a href="mailto:btmiller@helix.nih.gov" target="_blank">btmiller@helix.nih.gov</a><wbr>>>>> wrote:<br>
<br>
                        I am attempting to use ibsim to test some possible<br>
                        configuration changes in our routing, but I am<br>
        running<br>
                into<br>
                        some difficulties. I can get the simulator<br>
        started,<br>
                but opensm<br>
                        fails to discover the fabric in the simulated<br>
                environment. It<br>
                        discovers the switch to which the host running<br>
        opensm is<br>
                        connected, but it can't discover any further than<br>
                that. In the<br>
                        opensm log, I see:<br>
<br>
                        Feb 14 16:31:50 047307 [AD332700] 0x04 -><br>
                ni_rcv_process_new:<br>
                        Discovered new Switch node,<br>
                          GUID 0x7cfe900300b49890, TID 0x1239<br>
                        Feb 14 16:31:50 047821 [AD533700] 0x04 -><br>
                nd_rcv_process_nd:<br>
                        Node 0x7cfe900300b49890<br>
                          Description = SwitchIB Mellanox Technologies<br>
                        Feb 14 16:31:50 047847 [B5974700] 0x01 -><br>
                log_send_error: ERR<br>
                        5411: DR SMP Send completed with error<br>
        (IB_TIMEOUT) --<br>
                dropping<br>
                                                Method 0x1, Attr<br>
        0xFF17, TID<br>
                0x123b<br>
                        Feb 14 16:31:50 047866 [B5974700] 0x01 -><br>
        Received SMP<br>
                on a 1<br>
                        hop path: Initial path = 0,1, Return path  = 0,0<br>
                        Feb 14 16:31:50 047893 [B5974700] 0x01 -><br>
                        sm_mad_ctrl_send_err_cb: ERR 3113: MAD<br>
        completed in error<br>
                        (IB_TIMEOUT): SubnGet(GeneralInfo), attr_mod<br>
        0x4, TID<br>
                0x123b<br>
                        Feb 14 16:31:50 047913 [B5974700] 0x04 -><br>
                osm_hm_set_by_physp:<br>
                        Remote port of 0x7cfe900300b49890[0] couldn't<br>
        be found<br>
                        Feb 14 16:31:50 047921 [B5974700] 0x01 -><br>
                        sm_mad_ctrl_send_err_cb: ERR 3120: Timeout<br>
        while getting<br>
                        attribute 0xFF17 (GeneralInfo); Possible<br>
        mis-set mkey?<br>
                        Feb 14 16:31:50 047927 [B5974700] 0x01 -><br>
                        sm_mad_ctrl_send_err_cb: Error during<br>
        initialization: got<br>
                        General Info time out from node 0x7cfe900300b49890<br>
<br>
                        And in the simulator console, I see messages<br>
        of the form.<br>
<br>
                        ibwarn: [32331] process_packet: no one to<br>
        handle pkt:<br>
                class<br>
                        0x81, attr 0xff17<br>
<br>
                        Looking at the output of the "dump" command from<br>
                within the<br>
                        console, it shows that all ports are in<br>
        Init/LinkUp,<br>
                except<br>
                        for the SMA port, which is in state Active/LinkUp.<br>
<br>
                        Does anyone have any idea what I might be<br>
        doing wrong<br>
                here?<br>
<br>
                        Thanks,<br>
                        Tim<br>
<br>
                        --         Tim Miller<br>
                        NIH HPC systems staff<br>
        <a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a>> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a><br></div></div>
        <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a>>> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a>><br>
                <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a>>>><div><div class="h5"><br>
        <a href="https://hpc.nih.gov" target="_blank" rel="noreferrer">https://hpc.nih.gov</a><br>
<br>
                        ______________________________<wbr>_________________<br>
                        Users mailing list<br>
        <a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabrics.org</a> <mailto:<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabric<wbr>s.org</a>><br>
                <mailto:<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabric<wbr>s.org</a><br>
        <mailto:<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabric<wbr>s.org</a>>><br>
                <mailto:<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabric<wbr>s.org</a><br>
        <mailto:<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabric<wbr>s.org</a>><br>
                <mailto:<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabric<wbr>s.org</a><br>
        <mailto:<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabric<wbr>s.org</a>>>><br>
        <a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/m<wbr>ailman/listinfo/users</a><br>
        <<a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/<wbr>mailman/listinfo/users</a>><br>
                <<a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/<wbr>mailman/listinfo/users</a><br>
        <<a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/<wbr>mailman/listinfo/users</a>>><br>
                       <br>
        <<a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/<wbr>mailman/listinfo/users</a><br>
        <<a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/<wbr>mailman/listinfo/users</a>><br>
                <<a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/<wbr>mailman/listinfo/users</a><br>
        <<a href="http://lists.openfabrics.org/mailman/listinfo/users" target="_blank" rel="noreferrer">http://lists.openfabrics.org/<wbr>mailman/listinfo/users</a>>>><br>
<br>
<br>
<br>
<br>
            --     Tim Miller<br>
            NIH HPC systems staff<br>
        <a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a>> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a><br>
        <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a>>><br>
        <a href="https://hpc.nih.gov" target="_blank" rel="noreferrer">https://hpc.nih.gov</a><br>
<br>
<br>
<br>
    --     Tim Miller<br>
    NIH HPC systems staff<br>
    <a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a> <tel:<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a>><br>
    <a href="https://hpc.nih.gov" target="_blank" rel="noreferrer">https://hpc.nih.gov</a><br>
<br>
<br>
</div></div></blockquote><div class="HOEnZb"><div class="h5">
<br>
-- <br>
Tim Miller<br>
NIH HPC systems staff<br>
<a href="tel:301-827-5261" target="_blank" value="+13018275261">301-827-5261</a><br>
<a href="https://hpc.nih.gov" target="_blank" rel="noreferrer">https://hpc.nih.gov</a><br>
<br>
</div></div></blockquote></div><br></div>