[Users] IB topology config and polling state

Hal Rosenstock hal.rosenstock at gmail.com
Wed Oct 7 13:21:59 PDT 2015


What are those HCAs cabled to or is it internal to the blade enclosure ?

On Wed, Oct 7, 2015 at 3:24 PM, German Anders <ganders at despegar.com> wrote:

> Yeah, is there any command that I can run in order to change the port
> state on the remote switch? I mean everything looks good but in the hp
> blades still getting:
>
>
> # ibstat
> CA 'qib0'
>     CA type: InfiniPath_QMH7342
>     Number of ports: 2
>     Firmware version:
>     Hardware version: 2
>     Node GUID: 0x0011750000791fec
>     System image GUID: 0x0011750000791fec
>     Port 1:
>         State: *Down*
>         Physical state: *Polling*
>         Rate: 40
>         Base lid: 4660
>         LMC: 0
>         SM lid: 4660
>         Capability mask: 0x0761086a
>         Port GUID: 0x0011750000791fec
>         Link layer: InfiniBand
>     Port 2:
>         State: *Down*
>         Physical state: *Polling*
>         Rate: 40
>         Base lid: 4660
>         LMC: 0
>         SM lid: 4660
>         Capability mask: 0x0761086a
>         Port GUID: 0x0011750000791fed
>         Link layer: InfiniBand
>
>
> Also on working hosts I only see devices from the local network, but
> didn't see any of the blades hca connections.
>
>
>
> *German* <ganders at despegar.com>
>
> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>
>> The screen shot looks good :-) SM brought the link up to active.
>>
>> Note that the ibportstate command you gave was for switch port 0 of the
>> Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>
>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <ganders at despegar.com>
>> wrote:
>>
>>> Yes, find attached an screenshot of the port information (# 9) the one
>>> that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also from one of
>>> the hosts that are connected to one of the SX6018F I can see the 'remote'
>>> HP IB SW:
>>>
>>> # *ibnodes*
>>>
>>> (...)
>>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV Mellanox
>>> Technologies" base port 0 *lid 29 *lmc 0
>>> Switch    : 0xe41d2d030031e9c1 ports 37 "MF0;GWIB01:SX6036G/U1" enhanced
>>> port 0 lid 24 lmc 0
>>> (...)
>>>
>>> # *ibportstate -L 29 query*
>>> Switch PortInfo:
>>> # Port info: Lid 29 port 0
>>> LinkState:.......................Active
>>> PhysLinkState:...................LinkUp
>>> Lid:.............................29
>>> SMLid:...........................2
>>> LMC:.............................0
>>> LinkWidthSupported:..............1X or 4X
>>> LinkWidthEnabled:................1X or 4X
>>> LinkWidthActive:.................4X
>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedActive:.................10.0 Gbps
>>> Mkey:............................<not displayed>
>>> MkeyLeasePeriod:.................0
>>> ProtectBits:.....................0
>>>
>>>
>>>
>>>
>>>
>>> *German* <ganders at despegar.com>
>>>
>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>
>>>> One more thing hopefully before playing with the low level phy settings:
>>>>
>>>> Are you using known good cables ? Do you have FDR cables on the FDR <->
>>>> FDR links ? Cable lengths can matter as well.
>>>>
>>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>>> hal.rosenstock at gmail.com> wrote:
>>>>
>>>>> Were the ports mapped to the phy profile shutdown when you changed
>>>>> this ?
>>>>>
>>>>> LLR is a proprietary Mellanox mechanism.
>>>>>
>>>>> You might want 2 different profiles: one for the interfaces connected
>>>>> to other gateway interfaces (which are FDR (and FDR-10) capable and the
>>>>> other for the interfaces connecting to QDR (the older equipment in your
>>>>> network). By configuring the Switch-X interfaces to the appropriate
>>>>> possible speeds and disabling the proprietary mechanisms there, the link
>>>>> should not only come up but also this will occur faster than if FDR/FDR10
>>>>> are enabled.
>>>>>
>>>>> I suspect that due to the Switch-X configuration that the links to
>>>>> the switch(es) in the HP enclosures do not negotiate properly (as shown by
>>>>> down rather than LinkUp).
>>>>>
>>>>> Once you get all your links to INIT, negotiation has occurred and then
>>>>> it's time for SM to bring links to active.
>>>>>
>>>>> Since you have down links, the SM can't do anything about those.
>>>>>
>>>>>
>>>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <ganders at despegar.com>
>>>>> wrote:
>>>>>
>>>>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I know that
>>>>>> this kind of SW does not come with an embedded sm, but I don't know how to
>>>>>> access any mgmt at all on this particularly switch, I mean for example to
>>>>>> setup speed or anything like that, is possible to access through the
>>>>>> chassis?
>>>>>>
>>>>>>
>>>>>> *German* <ganders at despegar.com>
>>>>>>
>>>>>> 2015-10-07 13:19 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>>>
>>>>>>> I think so, but when trying to configured the phy-profile on the
>>>>>>> interface in order to negotiate on QDR it failed to map the profile:
>>>>>>>
>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>> high-speed-ber
>>>>>>>
>>>>>>>   Profile: high-speed-ber
>>>>>>>   --------
>>>>>>>   llr support ib-speed
>>>>>>>   SDR: disable
>>>>>>>   DDR: disable
>>>>>>>   QDR: disable
>>>>>>>   FDR10: enable-request
>>>>>>>   FDR: enable-request
>>>>>>>
>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>> hp-encl-isl
>>>>>>>
>>>>>>>   Profile: hp-encl-isl
>>>>>>>   --------
>>>>>>>   llr support ib-speed
>>>>>>>   SDR: disable
>>>>>>>   DDR: disable
>>>>>>>   QDR: enable
>>>>>>>   FDR10: enable-request
>>>>>>>   FDR: enable-request
>>>>>>>
>>>>>>> GWIB01 [proxy-ha-group: master] (config) #
>>>>>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9
>>>>>>> phy-profile map hp-encl-isl
>>>>>>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>>>>>>
>>>>>>>
>>>>>>> *German* <ganders at despegar.com>
>>>>>>>
>>>>>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>
>>>>>>>> The driver ‘qib’ is loading fine.  As can be seen by the ibstat
>>>>>>>> output.  The ib_ipath is an older card.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem is the link is not coming up to init.  Like Hal said
>>>>>>>> the link should transition to “link up” without the SMs involvement.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I think you are on to something with the fact that it seems like
>>>>>>>> your switch ports are not configured to do QDR.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ira
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* German Anders [mailto:ganders at despegar.com]
>>>>>>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>>>>>>> *To:* Weiny, Ira
>>>>>>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>>>>>>
>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes I've that file:
>>>>>>>>
>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>
>>>>>>>> Also I've done the install of libipathverbs:
>>>>>>>>
>>>>>>>> # apt-get install libipathverbs-dev
>>>>>>>>
>>>>>>>> But I try to load the ib_ipath module but I'm getting the following
>>>>>>>> error msg:
>>>>>>>>
>>>>>>>> # modprobe ib_ipath
>>>>>>>> modprobe: ERROR: could not insert 'ib_ipath': Device or resource
>>>>>>>> busy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *German*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>
>>>>>>>> There are a few issues for routing in that diagram but the links
>>>>>>>> should come up.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I assume there is some backplane between the blade servers and the
>>>>>>>> switch in that chassis?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Have you gotten libipathverbs installed?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> In ipathverbs there is a serdes tuning script.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Does your libipathverbs include that file?  If not try the latest
>>>>>>>> from github.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ira
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German Anders
>>>>>>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>>>>>>> *To:* Hal Rosenstock
>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Hal,
>>>>>>>>
>>>>>>>> Thanks for the reply, I've attach a pdf with the diagram topology,
>>>>>>>> I don't know if this is the best way to go or if there's another way to
>>>>>>>> connect and setup the IB network, tips and suggestions will be very
>>>>>>>> appreciated, also the mezzanine cards are already installed on the blade
>>>>>>>> hosts:
>>>>>>>>
>>>>>>>> # lspci
>>>>>>>> (...)
>>>>>>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> *German*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com
>>>>>>>> >:
>>>>>>>>
>>>>>>>> Hi again German,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Looks like you made some progress from yesterday as the qib ports
>>>>>>>> are now Polling rather than Disabled.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> But since they are Down, do you have them cabled to a switch ? That
>>>>>>>> should bring the links up and the port state will be Init. That is the
>>>>>>>> "starting" point.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> You will also then need to be running SM to bring the ports up to
>>>>>>>> Active.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- Hal
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <
>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I don't know if this is the mailist list for this kind of topic but
>>>>>>>> I'm really new to IB and I've just install two SX6036G gateways connected
>>>>>>>> to each other through two ISL ports, then I've configured a proxy-arp
>>>>>>>> between both nodes (sm is disable on both gw's):
>>>>>>>>
>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>>>>>>
>>>>>>>> Load balancing algorithm: ib-base-ip
>>>>>>>> Number of Proxy-Arp interfaces: 1
>>>>>>>>
>>>>>>>> Proxy-ARP VIP
>>>>>>>> =============
>>>>>>>> Pra-group name: proxy-ha-group
>>>>>>>> HA VIP address: 10.xx.xx.xx/xx
>>>>>>>>
>>>>>>>> Active nodes:
>>>>>>>> ID                   State                IP
>>>>>>>> --------------------------------------------------------------
>>>>>>>> GWIB01               master               10.xx.xx.xx1
>>>>>>>> GWIB02               standby              10.xx.xx.xx2
>>>>>>>>
>>>>>>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*), one
>>>>>>>> connected to GWIB01 and the other connected to GWIB02. The SM is configured
>>>>>>>> locally on both SWIB01 & SWIB02 switches. So far so good, after this config
>>>>>>>> I setup a commodity server with a MLNX IB ADPT FDR to the SWIB01 & SWIB02
>>>>>>>> switches, config the drivers, etc and then get it up & running fine.
>>>>>>>>
>>>>>>>> Finally I've setup a HP Enclosure with an internal IB SW (then
>>>>>>>> connect port 1 of the internal SW to GWIB01 - link is up but LLR status is
>>>>>>>> inactive), install one of the blades and I see the following:
>>>>>>>>
>>>>>>>> # ibstat
>>>>>>>> CA 'qib0'
>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>     Number of ports: 2
>>>>>>>>     Firmware version:
>>>>>>>>     Hardware version: 2
>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>     Port 1:
>>>>>>>>         State: Down
>>>>>>>>         Physical state: Polling
>>>>>>>>         Rate: 40
>>>>>>>>         Base lid: 4660
>>>>>>>>         LMC: 0
>>>>>>>>         SM lid: 4660
>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>         Link layer: InfiniBand
>>>>>>>>     Port 2:
>>>>>>>>         State: Down
>>>>>>>>         Physical state: Polling
>>>>>>>>         Rate: 40
>>>>>>>>         Base lid: 4660
>>>>>>>>         LMC: 0
>>>>>>>>         SM lid: 4660
>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>         Link layer: InfiniBand
>>>>>>>>
>>>>>>>> So I was wondering if maybe the SM is not being recognized on the
>>>>>>>> Blade system and that's why is not passing the Polling state, is that
>>>>>>>> possible? Or maybe is not possible to connect an ISL between the GW and the
>>>>>>>> HP internal SW so that the sm is available or maybe the inactive LLR is
>>>>>>>> causing this thing, any ideas? I thought about connecting the ISL
>>>>>>>> of the HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I don't
>>>>>>>> have any available ports.
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> *German*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at lists.openfabrics.org
>>>>>>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151007/93a9f864/attachment.html>


More information about the Users mailing list