[Users] Troubles with ibsim

Tim Miller btmiller at helix.nih.gov
Thu Feb 15 08:50:12 PST 2018


I am attempting to use ibsim to test some possible configuration changes 
in our routing, but I am running into some difficulties. I can get the 
simulator started, but opensm fails to discover the fabric in the 
simulated environment. It discovers the switch to which the host running 
opensm is connected, but it can't discover any further than that. In the 
opensm log, I see:

Feb 14 16:31:50 047307 [AD332700] 0x04 -> ni_rcv_process_new: Discovered 
new Switch node,
                                 GUID 0x7cfe900300b49890, TID 0x1239
Feb 14 16:31:50 047821 [AD533700] 0x04 -> nd_rcv_process_nd: Node 
0x7cfe900300b49890
                                 Description = SwitchIB Mellanox 
Technologies
Feb 14 16:31:50 047847 [B5974700] 0x01 -> log_send_error: ERR 5411: DR 
SMP Send completed with error (IB_TIMEOUT) -- dropping
                         Method 0x1, Attr 0xFF17, TID 0x123b
Feb 14 16:31:50 047866 [B5974700] 0x01 -> Received SMP on a 1 hop path: 
Initial path = 0,1, Return path  = 0,0
Feb 14 16:31:50 047893 [B5974700] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 
3113: MAD completed in error (IB_TIMEOUT): SubnGet(GeneralInfo), 
attr_mod 0x4, TID 0x123b
Feb 14 16:31:50 047913 [B5974700] 0x04 -> osm_hm_set_by_physp: Remote 
port of  0x7cfe900300b49890[0] couldn't be found
Feb 14 16:31:50 047921 [B5974700] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 
3120: Timeout while getting attribute 0xFF17 (GeneralInfo); Possible 
mis-set mkey?
Feb 14 16:31:50 047927 [B5974700] 0x01 -> sm_mad_ctrl_send_err_cb: Error 
during initialization: got General Info time out from node 
0x7cfe900300b49890

And in the simulator console, I see messages of the form.

ibwarn: [32331] process_packet: no one to handle pkt: class 0x81, attr 
0xff17

Looking at the output of the "dump" command from within the console, it 
shows that all ports are in Init/LinkUp, except for the SMA port, which 
is in state Active/LinkUp.

Does anyone have any idea what I might be doing wrong here?

Thanks,
Tim

-- 
Tim Miller
NIH HPC systems staff
301-827-5261
https://hpc.nih.gov




More information about the Users mailing list