[Users] Troubles with ibsim

Tim Miller btmiller at helix.nih.gov
Thu Feb 15 13:22:43 PST 2018


Hi Hal,

I'd have to see about sending you the whole netdiscover file. I could 
probably strip down the topology for debugging, as our We're running 
MOFED 4.2-1.0.0.

Here's a result of the a dump of that switch GUID from the simulator 
console - it appears that it's considering port 0 to be the SMA port:

sim> dump "S-7cfe900300b49890"
# Net status - Thu Feb 15 16:04:23 2018

Switch 36 "S-7cfe900300b49890"  nodeguid 7cfe900300b49890 sysimgguid 
7cfe900300b49890
#       linearcap 30720 FDBtop 0 portchange 1
7cfe900300b49890        [0]     "Sma Port"[0]    lid 1309 lmc 0 smlid 0  
4x  2.5G Active/LinkUp
7cfe900300b49890        [1]     "H-24be05ffffa8e4c0"[1]   4x FDR10 
Init/LinkUp
7cfe900300b49890        [2]     "H-24be05ffffa8f410"[1]   4x FDR10 
Init/LinkUp
7cfe900300b49890        [3]     "H-24be05ffffa82580"[1]   4x FDR10 
Init/LinkUp
...
7cfe900300b49890        [36]    "H-24be05ffffa85500"[1]   4x FDR10 
Init/LinkUp
#  dumped 1 nodes

Regards,
Tim

On 02/15/2018 04:01 PM, Hal Rosenstock wrote:
> Are you sure you're using MLNX ibsim with MLNX OpenSM ? It looks like 
> MLNX ibsim supports 0xff17 to me so the message "process_packet: no 
> one to handle pkt: class 0x81, attr 0xff17" shouldn't come out.
>
> Can you send me your ibnetdiscover file that is used as input to ibsim 
> ? Maybe the real problem is:
> osm_hm_set_by_physp: Remote port of  0x7cfe900300b49890[0] couldn't be 
> found
> That looks like remote port to some switch port 0 which looks odd to 
> me as switch port 0 has no peer port and it shouldn't be looking for 
> one. Which MLNX OpenSM version ?
>
>
> On Thu, Feb 15, 2018 at 3:50 PM, Tim Miller <btmiller at helix.nih.gov 
> <mailto:btmiller at helix.nih.gov>> wrote:
>
>     Hi Hal,
>
>     Thanks for looking into this. You're indeed correct that I'm using
>     an MLNX OFED ibsim (and opensm for that matter). I could try
>     running both from a vanilla OpenFabrics release and see if I have
>     any better luck; let me go ahead and try that...
>
>     Regards,
>     Tim
>
>     On 02/15/2018 03:39 PM, Hal Rosenstock wrote:
>
>         Just checked. MLNX OFED ibsim supports 0xff17 attribute.
>
>         On Thu, Feb 15, 2018 at 3:36 PM, Hal Rosenstock
>         <hal.rosenstock at gmail.com <mailto:hal.rosenstock at gmail.com>
>         <mailto:hal.rosenstock at gmail.com
>         <mailto:hal.rosenstock at gmail.com>>> wrote:
>
>             Hi Tim,
>
>             Attribute ID 0xff17 is in the vendor specific range for SM
>             attributes and not supported with (at least) the upstream
>         ibsim.
>
>             I think you are using MLNX OpenSM rather than upstream or OFED
>             OpenSM with the upstream ibsim. I'm not sure if MLNX ibsim
>             supports the additional vendor specific SM attributes or not.
>
>             Can you work with some upstream or OFED OpenSM or only
>         MLNX OpenSM
>             ? If not, I try to find out whether using the MLNX OFED ibsim
>             supports the additional attributes for running MLNX OpenSM.
>
>             -- Hal
>
>
>             On Thu, Feb 15, 2018 at 11:50 AM, Tim Miller
>             <btmiller at helix.nih.gov <mailto:btmiller at helix.nih.gov>
>         <mailto:btmiller at helix.nih.gov
>         <mailto:btmiller at helix.nih.gov>>> wrote:
>
>                 I am attempting to use ibsim to test some possible
>                 configuration changes in our routing, but I am running
>         into
>                 some difficulties. I can get the simulator started,
>         but opensm
>                 fails to discover the fabric in the simulated
>         environment. It
>                 discovers the switch to which the host running opensm is
>                 connected, but it can't discover any further than
>         that. In the
>                 opensm log, I see:
>
>                 Feb 14 16:31:50 047307 [AD332700] 0x04 ->
>         ni_rcv_process_new:
>                 Discovered new Switch node,
>                   GUID 0x7cfe900300b49890, TID 0x1239
>                 Feb 14 16:31:50 047821 [AD533700] 0x04 ->
>         nd_rcv_process_nd:
>                 Node 0x7cfe900300b49890
>                   Description = SwitchIB Mellanox Technologies
>                 Feb 14 16:31:50 047847 [B5974700] 0x01 ->
>         log_send_error: ERR
>                 5411: DR SMP Send completed with error (IB_TIMEOUT) --
>         dropping
>                                         Method 0x1, Attr 0xFF17, TID
>         0x123b
>                 Feb 14 16:31:50 047866 [B5974700] 0x01 -> Received SMP
>         on a 1
>                 hop path: Initial path = 0,1, Return path  = 0,0
>                 Feb 14 16:31:50 047893 [B5974700] 0x01 ->
>                 sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
>                 (IB_TIMEOUT): SubnGet(GeneralInfo), attr_mod 0x4, TID
>         0x123b
>                 Feb 14 16:31:50 047913 [B5974700] 0x04 ->
>         osm_hm_set_by_physp:
>                 Remote port of 0x7cfe900300b49890[0] couldn't be found
>                 Feb 14 16:31:50 047921 [B5974700] 0x01 ->
>                 sm_mad_ctrl_send_err_cb: ERR 3120: Timeout while getting
>                 attribute 0xFF17 (GeneralInfo); Possible mis-set mkey?
>                 Feb 14 16:31:50 047927 [B5974700] 0x01 ->
>                 sm_mad_ctrl_send_err_cb: Error during initialization: got
>                 General Info time out from node 0x7cfe900300b49890
>
>                 And in the simulator console, I see messages of the form.
>
>                 ibwarn: [32331] process_packet: no one to handle pkt:
>         class
>                 0x81, attr 0xff17
>
>                 Looking at the output of the "dump" command from
>         within the
>                 console, it shows that all ports are in Init/LinkUp,
>         except
>                 for the SMA port, which is in state Active/LinkUp.
>
>                 Does anyone have any idea what I might be doing wrong
>         here?
>
>                 Thanks,
>                 Tim
>
>                 --         Tim Miller
>                 NIH HPC systems staff
>         301-827-5261 <tel:301-827-5261> <tel:301-827-5261
>         <tel:301-827-5261>>
>         https://hpc.nih.gov
>
>                 _______________________________________________
>                 Users mailing list
>         Users at lists.openfabrics.org
>         <mailto:Users at lists.openfabrics.org>
>         <mailto:Users at lists.openfabrics.org
>         <mailto:Users at lists.openfabrics.org>>
>         http://lists.openfabrics.org/mailman/listinfo/users
>         <http://lists.openfabrics.org/mailman/listinfo/users>
>                 <http://lists.openfabrics.org/mailman/listinfo/users
>         <http://lists.openfabrics.org/mailman/listinfo/users>>
>
>
>
>
>     -- 
>     Tim Miller
>     NIH HPC systems staff
>     301-827-5261 <tel:301-827-5261>
>     https://hpc.nih.gov
>
>

-- 
Tim Miller
NIH HPC systems staff
301-827-5261
https://hpc.nih.gov




More information about the Users mailing list