[Users] IB topology config and polling state

German Anders ganders at despegar.com
Wed Oct 7 12:56:06 PDT 2015


find some of the output of the ibnetdiscover from one of the working hosts:

# ibnetdiscover
#
# Topology file: generated on Wed Oct  7 15:51:55 2015
#
# Initiated from node e41d2d0300163650 port e41d2d0300163651

vendid=0x2c9
devid=0xbd36
sysimgguid=0x2c902004b0918
switchguid=0x2c902004b0918(2c902004b0918)
Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV Mellanox
Technologies" base port 0 *lid 29* lmc 0
[1]    "S-e41d2d030031e9c1"[9]        # "MF0;GWIB01:SX6036G/U1" lid 24 4xQDR

vendid=0x2c9
devid=0xc738
sysimgguid=0xe41d2d030031e9c0
switchguid=0xe41d2d030031e9c1(e41d2d030031e9c1)
Switch    37 "S-e41d2d030031e9c1"        # "MF0;GWIB01:SX6036G/U1" enhanced
port 0 lid 24 lmc 0
[9]    "S-0002c902004b0918"[1]        # "Infiniscale-IV Mellanox
Technologies" lid 29 4xQDR
[33]    "S-e41d2d030031eb41"[33]        # "MF0;GWIB02:SX6036G/U1" lid 23
4xFDR10
[34]    "S-f45214030073f500"[16]        # "MF0;SWIB02:SX6018/U1" lid 1
4xFDR10
[35]    "S-e41d2d030031eb41"[35]        # "MF0;GWIB02:SX6036G/U1" lid 23
4xFDR10
[36]    "S-e41d2d0300097630"[18]        # "MF0;SWIB01:SX6018/U1" lid 2
4xFDR10
[37]    "H-e41d2d030031e9c2"[1](e41d2d030031e9c2)         #
"MF0;GWIB01:SX60XX/GW" lid 25 4xFDR

(...)

Clearly the connectivity problem is from the HP IB SW to the blades...


*German*

2015-10-07 16:26 GMT-03:00 German Anders <ganders at despegar.com>:

> for port #1:
>
> # ibportstate -L 29 1 query
> Switch PortInfo:
> # Port info: Lid 29 port 1
> LinkState:.......................Active
> PhysLinkState:...................LinkUp
> Lid:.............................75
> SMLid:...........................2328
> LMC:.............................0
> LinkWidthSupported:..............1X or 4X
> LinkWidthEnabled:................1X or 4X
> LinkWidthActive:.................4X
> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedActive:.................10.0 Gbps
> Peer PortInfo:
> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,1 port 9
> LinkState:.......................Active
> PhysLinkState:...................LinkUp
> Lid:.............................0
> SMLid:...........................0
> LMC:.............................0
> LinkWidthSupported:..............1X or 4X
> LinkWidthEnabled:................1X or 4X
> LinkWidthActive:.................4X
> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
> LinkSpeedActive:.................10.0 Gbps
> LinkSpeedExtSupported:...........14.0625 Gbps
> LinkSpeedExtEnabled:.............14.0625 Gbps
> LinkSpeedExtActive:..............No Extended Speed
> # MLNX ext Port info: Lid 29 DR path slid 4; dlid 65535; 0,1 port 9
> StateChangeEnable:...............0x00
> LinkSpeedSupported:..............0x01
> LinkSpeedEnabled:................0x01
> LinkSpeedActive:.................0x00
>
>
> *German*
>
> 2015-10-07 16:24 GMT-03:00 German Anders <ganders at despegar.com>:
>
>> Yeah, is there any command that I can run in order to change the port
>> state on the remote switch? I mean everything looks good but in the hp
>> blades still getting:
>>
>>
>> # ibstat
>> CA 'qib0'
>>     CA type: InfiniPath_QMH7342
>>     Number of ports: 2
>>     Firmware version:
>>     Hardware version: 2
>>     Node GUID: 0x0011750000791fec
>>     System image GUID: 0x0011750000791fec
>>     Port 1:
>>         State: *Down*
>>         Physical state: *Polling*
>>         Rate: 40
>>         Base lid: 4660
>>         LMC: 0
>>         SM lid: 4660
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fec
>>         Link layer: InfiniBand
>>     Port 2:
>>         State: *Down*
>>         Physical state: *Polling*
>>         Rate: 40
>>         Base lid: 4660
>>         LMC: 0
>>         SM lid: 4660
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fed
>>         Link layer: InfiniBand
>>
>>
>> Also on working hosts I only see devices from the local network, but
>> didn't see any of the blades hca connections.
>>
>>
>>
>> *German* <ganders at despegar.com>
>>
>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>
>>> The screen shot looks good :-) SM brought the link up to active.
>>>
>>> Note that the ibportstate command you gave was for switch port 0 of the
>>> Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>>
>>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>>> Yes, find attached an screenshot of the port information (# 9) the one
>>>> that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also from one of
>>>> the hosts that are connected to one of the SX6018F I can see the 'remote'
>>>> HP IB SW:
>>>>
>>>> # *ibnodes*
>>>>
>>>> (...)
>>>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV Mellanox
>>>> Technologies" base port 0 *lid 29 *lmc 0
>>>> Switch    : 0xe41d2d030031e9c1 ports 37 "MF0;GWIB01:SX6036G/U1"
>>>> enhanced port 0 lid 24 lmc 0
>>>> (...)
>>>>
>>>> # *ibportstate -L 29 query*
>>>> Switch PortInfo:
>>>> # Port info: Lid 29 port 0
>>>> LinkState:.......................Active
>>>> PhysLinkState:...................LinkUp
>>>> Lid:.............................29
>>>> SMLid:...........................2
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedActive:.................10.0 Gbps
>>>> Mkey:............................<not displayed>
>>>> MkeyLeasePeriod:.................0
>>>> ProtectBits:.....................0
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *German* <ganders at despegar.com>
>>>>
>>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>>
>>>>> One more thing hopefully before playing with the low level phy
>>>>> settings:
>>>>>
>>>>> Are you using known good cables ? Do you have FDR cables on the FDR
>>>>> <-> FDR links ? Cable lengths can matter as well.
>>>>>
>>>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>
>>>>>> Were the ports mapped to the phy profile shutdown when you changed
>>>>>> this ?
>>>>>>
>>>>>> LLR is a proprietary Mellanox mechanism.
>>>>>>
>>>>>> You might want 2 different profiles: one for the interfaces connected
>>>>>> to other gateway interfaces (which are FDR (and FDR-10) capable and the
>>>>>> other for the interfaces connecting to QDR (the older equipment in your
>>>>>> network). By configuring the Switch-X interfaces to the appropriate
>>>>>> possible speeds and disabling the proprietary mechanisms there, the link
>>>>>> should not only come up but also this will occur faster than if FDR/FDR10
>>>>>> are enabled.
>>>>>>
>>>>>> I suspect that due to the Switch-X configuration that the links to
>>>>>> the switch(es) in the HP enclosures do not negotiate properly (as shown by
>>>>>> down rather than LinkUp).
>>>>>>
>>>>>> Once you get all your links to INIT, negotiation has occurred and
>>>>>> then it's time for SM to bring links to active.
>>>>>>
>>>>>> Since you have down links, the SM can't do anything about those.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <ganders at despegar.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I know that
>>>>>>> this kind of SW does not come with an embedded sm, but I don't know how to
>>>>>>> access any mgmt at all on this particularly switch, I mean for example to
>>>>>>> setup speed or anything like that, is possible to access through the
>>>>>>> chassis?
>>>>>>>
>>>>>>>
>>>>>>> *German* <ganders at despegar.com>
>>>>>>>
>>>>>>> 2015-10-07 13:19 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>>>>
>>>>>>>> I think so, but when trying to configured the phy-profile on the
>>>>>>>> interface in order to negotiate on QDR it failed to map the profile:
>>>>>>>>
>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>> high-speed-ber
>>>>>>>>
>>>>>>>>   Profile: high-speed-ber
>>>>>>>>   --------
>>>>>>>>   llr support ib-speed
>>>>>>>>   SDR: disable
>>>>>>>>   DDR: disable
>>>>>>>>   QDR: disable
>>>>>>>>   FDR10: enable-request
>>>>>>>>   FDR: enable-request
>>>>>>>>
>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>> hp-encl-isl
>>>>>>>>
>>>>>>>>   Profile: hp-encl-isl
>>>>>>>>   --------
>>>>>>>>   llr support ib-speed
>>>>>>>>   SDR: disable
>>>>>>>>   DDR: disable
>>>>>>>>   QDR: enable
>>>>>>>>   FDR10: enable-request
>>>>>>>>   FDR: enable-request
>>>>>>>>
>>>>>>>> GWIB01 [proxy-ha-group: master] (config) #
>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9
>>>>>>>> phy-profile map hp-encl-isl
>>>>>>>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>>>>>>>
>>>>>>>>
>>>>>>>> *German* <ganders at despegar.com>
>>>>>>>>
>>>>>>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>
>>>>>>>>> The driver ‘qib’ is loading fine.  As can be seen by the ibstat
>>>>>>>>> output.  The ib_ipath is an older card.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The problem is the link is not coming up to init.  Like Hal said
>>>>>>>>> the link should transition to “link up” without the SMs involvement.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think you are on to something with the fact that it seems like
>>>>>>>>> your switch ports are not configured to do QDR.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ira
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:* German Anders [mailto:ganders at despegar.com]
>>>>>>>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>>>>>>>> *To:* Weiny, Ira
>>>>>>>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>>>>>>>
>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes I've that file:
>>>>>>>>>
>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>
>>>>>>>>> Also I've done the install of libipathverbs:
>>>>>>>>>
>>>>>>>>> # apt-get install libipathverbs-dev
>>>>>>>>>
>>>>>>>>> But I try to load the ib_ipath module but I'm getting the
>>>>>>>>> following error msg:
>>>>>>>>>
>>>>>>>>> # modprobe ib_ipath
>>>>>>>>> modprobe: ERROR: could not insert 'ib_ipath': Device or resource
>>>>>>>>> busy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *German*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>
>>>>>>>>> There are a few issues for routing in that diagram but the links
>>>>>>>>> should come up.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I assume there is some backplane between the blade servers and the
>>>>>>>>> switch in that chassis?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Have you gotten libipathverbs installed?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In ipathverbs there is a serdes tuning script.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does your libipathverbs include that file?  If not try the latest
>>>>>>>>> from github.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ira
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German Anders
>>>>>>>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>>>>>>>> *To:* Hal Rosenstock
>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Hal,
>>>>>>>>>
>>>>>>>>> Thanks for the reply, I've attach a pdf with the diagram topology,
>>>>>>>>> I don't know if this is the best way to go or if there's another way to
>>>>>>>>> connect and setup the IB network, tips and suggestions will be very
>>>>>>>>> appreciated, also the mezzanine cards are already installed on the blade
>>>>>>>>> hosts:
>>>>>>>>>
>>>>>>>>> # lspci
>>>>>>>>> (...)
>>>>>>>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev
>>>>>>>>> 02)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *German*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <
>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>
>>>>>>>>> Hi again German,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looks like you made some progress from yesterday as the qib ports
>>>>>>>>> are now Polling rather than Disabled.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But since they are Down, do you have them cabled to a switch ?
>>>>>>>>> That should bring the links up and the port state will be Init. That is the
>>>>>>>>> "starting" point.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You will also then need to be running SM to bring the ports up to
>>>>>>>>> Active.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- Hal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <
>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I don't know if this is the mailist list for this kind of topic
>>>>>>>>> but I'm really new to IB and I've just install two SX6036G gateways
>>>>>>>>> connected to each other through two ISL ports, then I've configured a
>>>>>>>>> proxy-arp between both nodes (sm is disable on both gw's):
>>>>>>>>>
>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>>>>>>>
>>>>>>>>> Load balancing algorithm: ib-base-ip
>>>>>>>>> Number of Proxy-Arp interfaces: 1
>>>>>>>>>
>>>>>>>>> Proxy-ARP VIP
>>>>>>>>> =============
>>>>>>>>> Pra-group name: proxy-ha-group
>>>>>>>>> HA VIP address: 10.xx.xx.xx/xx
>>>>>>>>>
>>>>>>>>> Active nodes:
>>>>>>>>> ID                   State                IP
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> GWIB01               master               10.xx.xx.xx1
>>>>>>>>> GWIB02               standby              10.xx.xx.xx2
>>>>>>>>>
>>>>>>>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*), one
>>>>>>>>> connected to GWIB01 and the other connected to GWIB02. The SM is configured
>>>>>>>>> locally on both SWIB01 & SWIB02 switches. So far so good, after this config
>>>>>>>>> I setup a commodity server with a MLNX IB ADPT FDR to the SWIB01 & SWIB02
>>>>>>>>> switches, config the drivers, etc and then get it up & running fine.
>>>>>>>>>
>>>>>>>>> Finally I've setup a HP Enclosure with an internal IB SW (then
>>>>>>>>> connect port 1 of the internal SW to GWIB01 - link is up but LLR status is
>>>>>>>>> inactive), install one of the blades and I see the following:
>>>>>>>>>
>>>>>>>>> # ibstat
>>>>>>>>> CA 'qib0'
>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>     Number of ports: 2
>>>>>>>>>     Firmware version:
>>>>>>>>>     Hardware version: 2
>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>     Port 1:
>>>>>>>>>         State: Down
>>>>>>>>>         Physical state: Polling
>>>>>>>>>         Rate: 40
>>>>>>>>>         Base lid: 4660
>>>>>>>>>         LMC: 0
>>>>>>>>>         SM lid: 4660
>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>     Port 2:
>>>>>>>>>         State: Down
>>>>>>>>>         Physical state: Polling
>>>>>>>>>         Rate: 40
>>>>>>>>>         Base lid: 4660
>>>>>>>>>         LMC: 0
>>>>>>>>>         SM lid: 4660
>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>
>>>>>>>>> So I was wondering if maybe the SM is not being recognized on the
>>>>>>>>> Blade system and that's why is not passing the Polling state, is that
>>>>>>>>> possible? Or maybe is not possible to connect an ISL between the GW and the
>>>>>>>>> HP internal SW so that the sm is available or maybe the inactive LLR is
>>>>>>>>> causing this thing, any ideas? I thought about connecting the ISL
>>>>>>>>> of the HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I don't
>>>>>>>>> have any available ports.
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *German*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151007/b384f0a0/attachment.html>


More information about the Users mailing list