[Users] Troubles with ibsim

Hal Rosenstock hal.rosenstock at gmail.com
Thu Feb 15 13:01:05 PST 2018


Are you sure you're using MLNX ibsim with MLNX OpenSM ? It looks like MLNX
ibsim supports 0xff17 to me so the message "process_packet: no one to
handle pkt: class 0x81, attr 0xff17" shouldn't come out.

Can you send me your ibnetdiscover file that is used as input to ibsim ?
Maybe the real problem is:
osm_hm_set_by_physp: Remote port of  0x7cfe900300b49890[0] couldn't be found
That looks like remote port to some switch port 0 which looks odd to me as
switch port 0 has no peer port and it shouldn't be looking for one. Which
MLNX OpenSM version ?


On Thu, Feb 15, 2018 at 3:50 PM, Tim Miller <btmiller at helix.nih.gov> wrote:

> Hi Hal,
>
> Thanks for looking into this. You're indeed correct that I'm using an MLNX
> OFED ibsim (and opensm for that matter). I could try running both from a
> vanilla OpenFabrics release and see if I have any better luck; let me go
> ahead and try that...
>
> Regards,
> Tim
>
> On 02/15/2018 03:39 PM, Hal Rosenstock wrote:
>
>> Just checked. MLNX OFED ibsim supports 0xff17 attribute.
>>
>> On Thu, Feb 15, 2018 at 3:36 PM, Hal Rosenstock <hal.rosenstock at gmail.com
>> <mailto:hal.rosenstock at gmail.com>> wrote:
>>
>>     Hi Tim,
>>
>>     Attribute ID 0xff17 is in the vendor specific range for SM
>>     attributes and not supported with (at least) the upstream ibsim.
>>
>>     I think you are using MLNX OpenSM rather than upstream or OFED
>>     OpenSM with the upstream ibsim. I'm not sure if MLNX ibsim
>>     supports the additional vendor specific SM attributes or not.
>>
>>     Can you work with some upstream or OFED OpenSM or only MLNX OpenSM
>>     ? If not, I try to find out whether using the MLNX OFED ibsim
>>     supports the additional attributes for running MLNX OpenSM.
>>
>>     -- Hal
>>
>>
>>     On Thu, Feb 15, 2018 at 11:50 AM, Tim Miller
>>     <btmiller at helix.nih.gov <mailto:btmiller at helix.nih.gov>> wrote:
>>
>>         I am attempting to use ibsim to test some possible
>>         configuration changes in our routing, but I am running into
>>         some difficulties. I can get the simulator started, but opensm
>>         fails to discover the fabric in the simulated environment. It
>>         discovers the switch to which the host running opensm is
>>         connected, but it can't discover any further than that. In the
>>         opensm log, I see:
>>
>>         Feb 14 16:31:50 047307 [AD332700] 0x04 -> ni_rcv_process_new:
>>         Discovered new Switch node,
>>           GUID 0x7cfe900300b49890, TID 0x1239
>>         Feb 14 16:31:50 047821 [AD533700] 0x04 -> nd_rcv_process_nd:
>>         Node 0x7cfe900300b49890
>>           Description = SwitchIB Mellanox Technologies
>>         Feb 14 16:31:50 047847 [B5974700] 0x01 -> log_send_error: ERR
>>         5411: DR SMP Send completed with error (IB_TIMEOUT) -- dropping
>>                                 Method 0x1, Attr 0xFF17, TID 0x123b
>>         Feb 14 16:31:50 047866 [B5974700] 0x01 -> Received SMP on a 1
>>         hop path: Initial path = 0,1, Return path  = 0,0
>>         Feb 14 16:31:50 047893 [B5974700] 0x01 ->
>>         sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error
>>         (IB_TIMEOUT): SubnGet(GeneralInfo), attr_mod 0x4, TID 0x123b
>>         Feb 14 16:31:50 047913 [B5974700] 0x04 -> osm_hm_set_by_physp:
>>         Remote port of 0x7cfe900300b49890[0] couldn't be found
>>         Feb 14 16:31:50 047921 [B5974700] 0x01 ->
>>         sm_mad_ctrl_send_err_cb: ERR 3120: Timeout while getting
>>         attribute 0xFF17 (GeneralInfo); Possible mis-set mkey?
>>         Feb 14 16:31:50 047927 [B5974700] 0x01 ->
>>         sm_mad_ctrl_send_err_cb: Error during initialization: got
>>         General Info time out from node 0x7cfe900300b49890
>>
>>         And in the simulator console, I see messages of the form.
>>
>>         ibwarn: [32331] process_packet: no one to handle pkt: class
>>         0x81, attr 0xff17
>>
>>         Looking at the output of the "dump" command from within the
>>         console, it shows that all ports are in Init/LinkUp, except
>>         for the SMA port, which is in state Active/LinkUp.
>>
>>         Does anyone have any idea what I might be doing wrong here?
>>
>>         Thanks,
>>         Tim
>>
>>         --         Tim Miller
>>         NIH HPC systems staff
>>         301-827-5261 <tel:301-827-5261>
>>         https://hpc.nih.gov
>>
>>         _______________________________________________
>>         Users mailing list
>>         Users at lists.openfabrics.org <mailto:Users at lists.openfabrics.org>
>>         http://lists.openfabrics.org/mailman/listinfo/users
>>         <http://lists.openfabrics.org/mailman/listinfo/users>
>>
>>
>>
>>
> --
> Tim Miller
> NIH HPC systems staff
> 301-827-5261
> https://hpc.nih.gov
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20180215/ca904ce0/attachment.html>


More information about the Users mailing list