[Users] Troubles with ibsim
Tim Miller
btmiller at helix.nih.gov
Thu Feb 15 12:50:45 PST 2018
Hi Hal,
Thanks for looking into this. You're indeed correct that I'm using an
MLNX OFED ibsim (and opensm for that matter). I could try running both
from a vanilla OpenFabrics release and see if I have any better luck;
let me go ahead and try that...
Regards,
Tim
On 02/15/2018 03:39 PM, Hal Rosenstock wrote:
> Just checked. MLNX OFED ibsim supports 0xff17 attribute.
>
> On Thu, Feb 15, 2018 at 3:36 PM, Hal Rosenstock
> <hal.rosenstock at gmail.com <mailto:hal.rosenstock at gmail.com>> wrote:
>
> Hi Tim,
>
> Attribute ID 0xff17 is in the vendor specific range for SM
> attributes and not supported with (at least) the upstream ibsim.
>
> I think you are using MLNX OpenSM rather than upstream or OFED
> OpenSM with the upstream ibsim. I'm not sure if MLNX ibsim
> supports the additional vendor specific SM attributes or not.
>
> Can you work with some upstream or OFED OpenSM or only MLNX OpenSM
> ? If not, I try to find out whether using the MLNX OFED ibsim
> supports the additional attributes for running MLNX OpenSM.
>
> -- Hal
>
>
> On Thu, Feb 15, 2018 at 11:50 AM, Tim Miller
> <btmiller at helix.nih.gov <mailto:btmiller at helix.nih.gov>> wrote:
>
> I am attempting to use ibsim to test some possible
> configuration changes in our routing, but I am running into
> some difficulties. I can get the simulator started, but opensm
> fails to discover the fabric in the simulated environment. It
> discovers the switch to which the host running opensm is
> connected, but it can't discover any further than that. In the
> opensm log, I see:
>
> Feb 14 16:31:50 047307 [AD332700] 0x04 -> ni_rcv_process_new:
> Discovered new Switch node,
> GUID 0x7cfe900300b49890, TID 0x1239
> Feb 14 16:31:50 047821 [AD533700] 0x04 -> nd_rcv_process_nd:
> Node 0x7cfe900300b49890
> Description = SwitchIB Mellanox Technologies
> Feb 14 16:31:50 047847 [B5974700] 0x01 -> log_send_error: ERR
> 5411: DR SMP Send completed with error (IB_TIMEOUT) -- dropping
> Method 0x1, Attr 0xFF17, TID 0x123b
> Feb 14 16:31:50 047866 [B5974700] 0x01 -> Received SMP on a 1
> hop path: Initial path = 0,1, Return path = 0,0
> Feb 14 16:31:50 047893 [B5974700] 0x01 ->
> sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
> (IB_TIMEOUT): SubnGet(GeneralInfo), attr_mod 0x4, TID 0x123b
> Feb 14 16:31:50 047913 [B5974700] 0x04 -> osm_hm_set_by_physp:
> Remote port of 0x7cfe900300b49890[0] couldn't be found
> Feb 14 16:31:50 047921 [B5974700] 0x01 ->
> sm_mad_ctrl_send_err_cb: ERR 3120: Timeout while getting
> attribute 0xFF17 (GeneralInfo); Possible mis-set mkey?
> Feb 14 16:31:50 047927 [B5974700] 0x01 ->
> sm_mad_ctrl_send_err_cb: Error during initialization: got
> General Info time out from node 0x7cfe900300b49890
>
> And in the simulator console, I see messages of the form.
>
> ibwarn: [32331] process_packet: no one to handle pkt: class
> 0x81, attr 0xff17
>
> Looking at the output of the "dump" command from within the
> console, it shows that all ports are in Init/LinkUp, except
> for the SMA port, which is in state Active/LinkUp.
>
> Does anyone have any idea what I might be doing wrong here?
>
> Thanks,
> Tim
>
> --
> Tim Miller
> NIH HPC systems staff
> 301-827-5261 <tel:301-827-5261>
> https://hpc.nih.gov
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org <mailto:Users at lists.openfabrics.org>
> http://lists.openfabrics.org/mailman/listinfo/users
> <http://lists.openfabrics.org/mailman/listinfo/users>
>
>
>
--
Tim Miller
NIH HPC systems staff
301-827-5261
https://hpc.nih.gov
More information about the Users
mailing list