[Users] IB topology config and polling state

Hal Rosenstock hal.rosenstock at gmail.com
Wed Oct 7 13:24:06 PDT 2015


Yes, somehow the enclosure is not internally connected properly otherwise
you'd see more topology off the other switch ports. I think it has more
than 1 switch.

On Wed, Oct 7, 2015 at 3:56 PM, German Anders <ganders at despegar.com> wrote:

> find some of the output of the ibnetdiscover from one of the working hosts:
>
> # ibnetdiscover
> #
> # Topology file: generated on Wed Oct  7 15:51:55 2015
> #
> # Initiated from node e41d2d0300163650 port e41d2d0300163651
>
> vendid=0x2c9
> devid=0xbd36
> sysimgguid=0x2c902004b0918
> switchguid=0x2c902004b0918(2c902004b0918)
> Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV Mellanox
> Technologies" base port 0 *lid 29* lmc 0
> [1]    "S-e41d2d030031e9c1"[9]        # "MF0;GWIB01:SX6036G/U1" lid 24
> 4xQDR
>
> vendid=0x2c9
> devid=0xc738
> sysimgguid=0xe41d2d030031e9c0
> switchguid=0xe41d2d030031e9c1(e41d2d030031e9c1)
> Switch    37 "S-e41d2d030031e9c1"        # "MF0;GWIB01:SX6036G/U1"
> enhanced port 0 lid 24 lmc 0
> [9]    "S-0002c902004b0918"[1]        # "Infiniscale-IV Mellanox
> Technologies" lid 29 4xQDR
> [33]    "S-e41d2d030031eb41"[33]        # "MF0;GWIB02:SX6036G/U1" lid 23
> 4xFDR10
> [34]    "S-f45214030073f500"[16]        # "MF0;SWIB02:SX6018/U1" lid 1
> 4xFDR10
> [35]    "S-e41d2d030031eb41"[35]        # "MF0;GWIB02:SX6036G/U1" lid 23
> 4xFDR10
> [36]    "S-e41d2d0300097630"[18]        # "MF0;SWIB01:SX6018/U1" lid 2
> 4xFDR10
> [37]    "H-e41d2d030031e9c2"[1](e41d2d030031e9c2)         #
> "MF0;GWIB01:SX60XX/GW" lid 25 4xFDR
>
> (...)
>
> Clearly the connectivity problem is from the HP IB SW to the blades...
>
>
> *German*
>
> 2015-10-07 16:26 GMT-03:00 German Anders <ganders at despegar.com>:
>
>> for port #1:
>>
>> # ibportstate -L 29 1 query
>> Switch PortInfo:
>> # Port info: Lid 29 port 1
>> LinkState:.......................Active
>> PhysLinkState:...................LinkUp
>> Lid:.............................75
>> SMLid:...........................2328
>> LMC:.............................0
>> LinkWidthSupported:..............1X or 4X
>> LinkWidthEnabled:................1X or 4X
>> LinkWidthActive:.................4X
>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedActive:.................10.0 Gbps
>> Peer PortInfo:
>> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,1 port 9
>> LinkState:.......................Active
>> PhysLinkState:...................LinkUp
>> Lid:.............................0
>> SMLid:...........................0
>> LMC:.............................0
>> LinkWidthSupported:..............1X or 4X
>> LinkWidthEnabled:................1X or 4X
>> LinkWidthActive:.................4X
>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedActive:.................10.0 Gbps
>> LinkSpeedExtSupported:...........14.0625 Gbps
>> LinkSpeedExtEnabled:.............14.0625 Gbps
>> LinkSpeedExtActive:..............No Extended Speed
>> # MLNX ext Port info: Lid 29 DR path slid 4; dlid 65535; 0,1 port 9
>> StateChangeEnable:...............0x00
>> LinkSpeedSupported:..............0x01
>> LinkSpeedEnabled:................0x01
>> LinkSpeedActive:.................0x00
>>
>>
>> *German*
>>
>> 2015-10-07 16:24 GMT-03:00 German Anders <ganders at despegar.com>:
>>
>>> Yeah, is there any command that I can run in order to change the port
>>> state on the remote switch? I mean everything looks good but in the hp
>>> blades still getting:
>>>
>>>
>>> # ibstat
>>> CA 'qib0'
>>>     CA type: InfiniPath_QMH7342
>>>     Number of ports: 2
>>>     Firmware version:
>>>     Hardware version: 2
>>>     Node GUID: 0x0011750000791fec
>>>     System image GUID: 0x0011750000791fec
>>>     Port 1:
>>>         State: *Down*
>>>         Physical state: *Polling*
>>>         Rate: 40
>>>         Base lid: 4660
>>>         LMC: 0
>>>         SM lid: 4660
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fec
>>>         Link layer: InfiniBand
>>>     Port 2:
>>>         State: *Down*
>>>         Physical state: *Polling*
>>>         Rate: 40
>>>         Base lid: 4660
>>>         LMC: 0
>>>         SM lid: 4660
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fed
>>>         Link layer: InfiniBand
>>>
>>>
>>> Also on working hosts I only see devices from the local network, but
>>> didn't see any of the blades hca connections.
>>>
>>>
>>>
>>> *German* <ganders at despegar.com>
>>>
>>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>
>>>> The screen shot looks good :-) SM brought the link up to active.
>>>>
>>>> Note that the ibportstate command you gave was for switch port 0 of the
>>>> Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>>>
>>>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <ganders at despegar.com>
>>>> wrote:
>>>>
>>>>> Yes, find attached an screenshot of the port information (# 9) the one
>>>>> that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also from one of
>>>>> the hosts that are connected to one of the SX6018F I can see the 'remote'
>>>>> HP IB SW:
>>>>>
>>>>> # *ibnodes*
>>>>>
>>>>> (...)
>>>>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV Mellanox
>>>>> Technologies" base port 0 *lid 29 *lmc 0
>>>>> Switch    : 0xe41d2d030031e9c1 ports 37 "MF0;GWIB01:SX6036G/U1"
>>>>> enhanced port 0 lid 24 lmc 0
>>>>> (...)
>>>>>
>>>>> # *ibportstate -L 29 query*
>>>>> Switch PortInfo:
>>>>> # Port info: Lid 29 port 0
>>>>> LinkState:.......................Active
>>>>> PhysLinkState:...................LinkUp
>>>>> Lid:.............................29
>>>>> SMLid:...........................2
>>>>> LMC:.............................0
>>>>> LinkWidthSupported:..............1X or 4X
>>>>> LinkWidthEnabled:................1X or 4X
>>>>> LinkWidthActive:.................4X
>>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>>> LinkSpeedActive:.................10.0 Gbps
>>>>> Mkey:............................<not displayed>
>>>>> MkeyLeasePeriod:.................0
>>>>> ProtectBits:.....................0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *German* <ganders at despegar.com>
>>>>>
>>>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>>>
>>>>>> One more thing hopefully before playing with the low level phy
>>>>>> settings:
>>>>>>
>>>>>> Are you using known good cables ? Do you have FDR cables on the FDR
>>>>>> <-> FDR links ? Cable lengths can matter as well.
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>
>>>>>>> Were the ports mapped to the phy profile shutdown when you changed
>>>>>>> this ?
>>>>>>>
>>>>>>> LLR is a proprietary Mellanox mechanism.
>>>>>>>
>>>>>>> You might want 2 different profiles: one for the interfaces
>>>>>>> connected to other gateway interfaces (which are FDR (and FDR-10) capable
>>>>>>> and the other for the interfaces connecting to QDR (the older equipment in
>>>>>>> your network). By configuring the Switch-X interfaces to the appropriate
>>>>>>> possible speeds and disabling the proprietary mechanisms there, the link
>>>>>>> should not only come up but also this will occur faster than if FDR/FDR10
>>>>>>> are enabled.
>>>>>>>
>>>>>>> I suspect that due to the Switch-X configuration that the links to
>>>>>>> the switch(es) in the HP enclosures do not negotiate properly (as shown by
>>>>>>> down rather than LinkUp).
>>>>>>>
>>>>>>> Once you get all your links to INIT, negotiation has occurred and
>>>>>>> then it's time for SM to bring links to active.
>>>>>>>
>>>>>>> Since you have down links, the SM can't do anything about those.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <ganders at despegar.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I know
>>>>>>>> that this kind of SW does not come with an embedded sm, but I don't know
>>>>>>>> how to access any mgmt at all on this particularly switch, I mean for
>>>>>>>> example to setup speed or anything like that, is possible to access through
>>>>>>>> the chassis?
>>>>>>>>
>>>>>>>>
>>>>>>>> *German* <ganders at despegar.com>
>>>>>>>>
>>>>>>>> 2015-10-07 13:19 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>>>>>
>>>>>>>>> I think so, but when trying to configured the phy-profile on the
>>>>>>>>> interface in order to negotiate on QDR it failed to map the profile:
>>>>>>>>>
>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>> high-speed-ber
>>>>>>>>>
>>>>>>>>>   Profile: high-speed-ber
>>>>>>>>>   --------
>>>>>>>>>   llr support ib-speed
>>>>>>>>>   SDR: disable
>>>>>>>>>   DDR: disable
>>>>>>>>>   QDR: disable
>>>>>>>>>   FDR10: enable-request
>>>>>>>>>   FDR: enable-request
>>>>>>>>>
>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>> hp-encl-isl
>>>>>>>>>
>>>>>>>>>   Profile: hp-encl-isl
>>>>>>>>>   --------
>>>>>>>>>   llr support ib-speed
>>>>>>>>>   SDR: disable
>>>>>>>>>   DDR: disable
>>>>>>>>>   QDR: enable
>>>>>>>>>   FDR10: enable-request
>>>>>>>>>   FDR: enable-request
>>>>>>>>>
>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) #
>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9
>>>>>>>>> phy-profile map hp-encl-isl
>>>>>>>>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *German* <ganders at despegar.com>
>>>>>>>>>
>>>>>>>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>
>>>>>>>>>> The driver ‘qib’ is loading fine.  As can be seen by the ibstat
>>>>>>>>>> output.  The ib_ipath is an older card.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The problem is the link is not coming up to init.  Like Hal said
>>>>>>>>>> the link should transition to “link up” without the SMs involvement.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think you are on to something with the fact that it seems like
>>>>>>>>>> your switch ports are not configured to do QDR.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ira
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:* German Anders [mailto:ganders at despegar.com]
>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>>>>>>>>> *To:* Weiny, Ira
>>>>>>>>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>>>>>>>>
>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes I've that file:
>>>>>>>>>>
>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>
>>>>>>>>>> Also I've done the install of libipathverbs:
>>>>>>>>>>
>>>>>>>>>> # apt-get install libipathverbs-dev
>>>>>>>>>>
>>>>>>>>>> But I try to load the ib_ipath module but I'm getting the
>>>>>>>>>> following error msg:
>>>>>>>>>>
>>>>>>>>>> # modprobe ib_ipath
>>>>>>>>>> modprobe: ERROR: could not insert 'ib_ipath': Device or resource
>>>>>>>>>> busy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *German*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>
>>>>>>>>>> There are a few issues for routing in that diagram but the links
>>>>>>>>>> should come up.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I assume there is some backplane between the blade servers and
>>>>>>>>>> the switch in that chassis?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Have you gotten libipathverbs installed?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In ipathverbs there is a serdes tuning script.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Does your libipathverbs include that file?  If not try the latest
>>>>>>>>>> from github.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ira
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German Anders
>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>>>>>>>>> *To:* Hal Rosenstock
>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Hal,
>>>>>>>>>>
>>>>>>>>>> Thanks for the reply, I've attach a pdf with the diagram
>>>>>>>>>> topology, I don't know if this is the best way to go or if there's another
>>>>>>>>>> way to connect and setup the IB network, tips and suggestions will be very
>>>>>>>>>> appreciated, also the mezzanine cards are already installed on the blade
>>>>>>>>>> hosts:
>>>>>>>>>>
>>>>>>>>>> # lspci
>>>>>>>>>> (...)
>>>>>>>>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev
>>>>>>>>>> 02)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *German*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <
>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>
>>>>>>>>>> Hi again German,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Looks like you made some progress from yesterday as the qib ports
>>>>>>>>>> are now Polling rather than Disabled.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But since they are Down, do you have them cabled to a switch ?
>>>>>>>>>> That should bring the links up and the port state will be Init. That is the
>>>>>>>>>> "starting" point.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You will also then need to be running SM to bring the ports up to
>>>>>>>>>> Active.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- Hal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <
>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I don't know if this is the mailist list for this kind of topic
>>>>>>>>>> but I'm really new to IB and I've just install two SX6036G gateways
>>>>>>>>>> connected to each other through two ISL ports, then I've configured a
>>>>>>>>>> proxy-arp between both nodes (sm is disable on both gw's):
>>>>>>>>>>
>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>>>>>>>>
>>>>>>>>>> Load balancing algorithm: ib-base-ip
>>>>>>>>>> Number of Proxy-Arp interfaces: 1
>>>>>>>>>>
>>>>>>>>>> Proxy-ARP VIP
>>>>>>>>>> =============
>>>>>>>>>> Pra-group name: proxy-ha-group
>>>>>>>>>> HA VIP address: 10.xx.xx.xx/xx
>>>>>>>>>>
>>>>>>>>>> Active nodes:
>>>>>>>>>> ID                   State                IP
>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>> GWIB01               master               10.xx.xx.xx1
>>>>>>>>>> GWIB02               standby              10.xx.xx.xx2
>>>>>>>>>>
>>>>>>>>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*), one
>>>>>>>>>> connected to GWIB01 and the other connected to GWIB02. The SM is configured
>>>>>>>>>> locally on both SWIB01 & SWIB02 switches. So far so good, after this config
>>>>>>>>>> I setup a commodity server with a MLNX IB ADPT FDR to the SWIB01 & SWIB02
>>>>>>>>>> switches, config the drivers, etc and then get it up & running fine.
>>>>>>>>>>
>>>>>>>>>> Finally I've setup a HP Enclosure with an internal IB SW (then
>>>>>>>>>> connect port 1 of the internal SW to GWIB01 - link is up but LLR status is
>>>>>>>>>> inactive), install one of the blades and I see the following:
>>>>>>>>>>
>>>>>>>>>> # ibstat
>>>>>>>>>> CA 'qib0'
>>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>>     Number of ports: 2
>>>>>>>>>>     Firmware version:
>>>>>>>>>>     Hardware version: 2
>>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>>     Port 1:
>>>>>>>>>>         State: Down
>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>         Rate: 40
>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>         LMC: 0
>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>     Port 2:
>>>>>>>>>>         State: Down
>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>         Rate: 40
>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>         LMC: 0
>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>
>>>>>>>>>> So I was wondering if maybe the SM is not being recognized on the
>>>>>>>>>> Blade system and that's why is not passing the Polling state, is that
>>>>>>>>>> possible? Or maybe is not possible to connect an ISL between the GW and the
>>>>>>>>>> HP internal SW so that the sm is available or maybe the inactive LLR is
>>>>>>>>>> causing this thing, any ideas? I thought about connecting the
>>>>>>>>>> ISL of the HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I don't
>>>>>>>>>> have any available ports.
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *German*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151007/4186b315/attachment.html>


More information about the Users mailing list