[Users] IB topology config and polling state

Hal Rosenstock hal.rosenstock at gmail.com
Tue Oct 13 05:13:16 PDT 2015


Hi German,

Are the cards in the correct bays and slots ?

Do you have the HP Onboard Administrator tool ? What does it say about
internal connectivity ?

-- Hal



On Tue, Oct 13, 2015 at 7:44 AM, German Anders <ganders at despegar.com> wrote:

> Hi Ira,
>
> I've some HP documentation but it quite short, also it doesn't describe
> any 'config' or 'impl' steps in order to get the internal switch up and
> running. The version of SW that came with enclosure does not had any
> management module at all, so it depends on external management. During
> weekend I've found a way to upgrade the firmware of the HP switch with the
> following command (mlxburn -d lid-0x001D -fw fw-IS4.mlx), and the I've run
> (flint -d /dev/mst/SW_MT48438_0x2c902004b0918_lid-0x001D dc
> /home/ceph/HPIBSW.INI) and found the following inside that file:
>
> [PS_INFO]
> Name = 489184-B21
> Description = HP BLc 4X QDR IB Switch
>
> [ADAPTER]
> PSID = HP_0100000009
>
> (...)
>
> [IB_TO_HW_MAP]
> PORT1=14
> PORT2=15
> PORT3=16
> PORT4=17
> PORT5=18
> PORT6=12
> PORT7=11
> PORT8=10
> PORT9=9
> PORT10=8
> PORT11=7
> PORT12=6
> PORT13=5
> PORT14=4
> PORT15=3
> PORT16=2
> PORT17=20
> PORT18=22
> PORT19=24
>
> PORT20=26
> PORT21=28
> PORT22=30
> PORT23=35
> PORT24=33
> PORT25=21
> PORT26=23
> PORT27=25
> PORT28=27
> PORT29=29
> PORT30=36
> PORT31=34
> PORT32=32
> PORT33=1
> PORT34=13
> PORT35=19
> PORT36=31
>
> [unused_ports]
> hw_port1_not_in_use=1
> hw_port13_not_in_use=1
> hw_port19_not_in_use=1
> hw_port31_not_in_use=1
>
> (...)
>
> I don't know if maybe there's some issue with the port mapping, anyone had
> used this kind of switch?
>
> The summary of the problem is correct, the connectivity between the IB
> network (MLNX switches/gw) and the HP IB switch is working since I was able
> to upgrade the firmare of the switch and get information about it. But, the
> connection between the mezzanine cards of the blades and the internal IB sw
> enclosure is not working at all. Note, that if I go to the OA
> administration of the enclosure I can see the 'green' ports mapping of each
> of the blades and the interconnection switch, so I'm guessing that it
> should be working.
>
> Regarding the questions:
>
> 1)      What type of switch is in the HP chassis?
>
>
> *QLogic HP BLc 4X QDR IB Switch*
>
> *PSID = HP_0100000009*
>
> *Image type:   FS2*
>
> *FW ver:         7.4.3000*
>
> *Device ID:     48438*
> *GUI:              0002c902004b0918*
>
> 2)      Do you have console access or http access to that switch?
>
> *No, since it didn't had any manage module mezzanine card inside the
> switch, it only come with a i2c port. But, i can have access through the
> mlxburn and flint tools from one host that's connected to the ib network
> (outside the enclosure).*
>
> 3)      Does that switch have an SM in it?
>
> *No*
>
> 4)      What version of the kernel are you running with the qib cards?
>
> a.       I assume you are using the qib driver in that kernel.
>
> *Ubuntu 14.04.3 LTS - kernel 3.18.20-031820-generic*
>
>
>
> At some point Hal spoke of “LLR being a Mellanox thing”  Was that to solve
> the problem of connecting the “HP switch” to the Mellanox switch?
>
>
>
> *No, since LLR is only supported between mlnx devices, the ISL are up and
> working, since it's possible for me to query the switch*
>
>
>
> I would like it if you could verify that the
>
>
>
> /usr/sbin/truescale-serdes.cmds
>
>
>
> Is being run?
>
>
> *When trying to run the command:*
>
>
>
> *# /usr/sbin/truescale-serdes.cmds/usr/sbin/truescale-serdes.cmds: 100:
> /usr/sbin/truescale-serdes.cmds: Syntax error: "(" unexpected (expecting
> "}")*
>
>
>
> Also what version of libipathverbs do you have?
>
>
>
>
> *# rpm -qa | grep libipathverbslibipathverbs-1.3-1.x86_64*
> Thanks in advance,
>
> Cheers,
>
>
>
> *German*
> 2015-10-13 2:14 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>
>> German,
>>
>>
>>
>> Do you have any documentation on the HP blade system?  And the switch
>> which is in that system?
>>
>>
>>
>> I have to admit I have not followed everything in this thread regarding
>> your configuration but it seems like you have some mellanox switches
>> connected into an HP chassis which has both a switch and blades with qib
>> (Truescale) cards.
>>
>>
>>
>> The connection from the mellanox switch to the “HP chassis switch” is
>> linkup (active) but the connections to the individual qib HCAs are not even
>> linkup.
>>
>>
>>
>> Is that a correct summary of the problem?
>>
>>
>>
>> If so here are some questions:
>>
>>
>>
>> 1)      What type of switch is in the HP chassis?
>>
>> 2)      Do you have console access or http access to that switch?
>>
>> 3)      Does that switch have an SM in it?
>>
>> 4)      What version of the kernel are you running with the qib cards?
>>
>> a.       I assume you are using the qib driver in that kernel.
>>
>>
>>
>> At some point Hal spoke of “LLR being a Mellanox thing”  Was that to
>> solve the problem of connecting the “HP switch” to the Mellanox switch?
>>
>>
>>
>> I would like it if you could verify that the
>>
>>
>>
>> /usr/sbin/truescale-serdes.cmds
>>
>>
>>
>> Is being run?
>>
>>
>>
>> Also what version of libipathverbs do you have?
>>
>>
>>
>> Ira
>>
>>
>>
>> *From:* users-bounces at lists.openfabrics.org [mailto:
>> users-bounces at lists.openfabrics.org] *On Behalf Of *Weiny, Ira
>> *Sent:* Wednesday, October 07, 2015 1:31 PM
>> *To:* Hal Rosenstock; German Anders
>>
>> *Cc:* users at lists.openfabrics.org
>> *Subject:* Re: [Users] IB topology config and polling state
>>
>>
>>
>> Agree with Hal here.
>>
>>
>>
>> I’m not familiar with those blades/switches.  I’ll ask around.
>>
>>
>>
>> Ira
>>
>>
>>
>> *From:* Hal Rosenstock [mailto:hal.rosenstock at gmail.com
>> <hal.rosenstock at gmail.com>]
>> *Sent:* Wednesday, October 07, 2015 1:26 PM
>> *To:* German Anders
>> *Cc:* Weiny, Ira; users at lists.openfabrics.org
>> *Subject:* Re: [Users] IB topology config and polling state
>>
>>
>>
>> That's the gateway to the switch in the enclosure. It's the internal
>> connectivity in the blade enclosure that's (physically) broken.
>>
>>
>>
>> On Wed, Oct 7, 2015 at 4:24 PM, German Anders <ganders at despegar.com>
>> wrote:
>>
>> cabled
>>
>> the blade it's:
>>
>> vendid=0x2c9
>> devid=0xbd36
>> sysimgguid=0x2c902004b0918
>> switchguid=0x2c902004b0918(2c902004b0918)
>> Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV Mellanox
>> Technologies" base port 0 *lid 29* lmc 0
>> [1]    "S-e41d2d030031e9c1"[9]        # "MF0;GWIB01:SX6036G/U1" lid 24
>> 4xQDR
>>
>>
>> *German*
>>
>>
>>
>> 2015-10-07 17:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>
>> What are those HCAs cabled to or is it internal to the blade enclosure ?
>>
>>
>>
>> On Wed, Oct 7, 2015 at 3:24 PM, German Anders <ganders at despegar.com>
>> wrote:
>>
>> Yeah, is there any command that I can run in order to change the port
>> state on the remote switch? I mean everything looks good but in the hp
>> blades still getting:
>>
>>
>>
>> # ibstat
>> CA 'qib0'
>>     CA type: InfiniPath_QMH7342
>>     Number of ports: 2
>>     Firmware version:
>>     Hardware version: 2
>>     Node GUID: 0x0011750000791fec
>>     System image GUID: 0x0011750000791fec
>>     Port 1:
>>         State: *Down*
>>         Physical state: *Polling*
>>         Rate: 40
>>         Base lid: 4660
>>         LMC: 0
>>         SM lid: 4660
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fec
>>         Link layer: InfiniBand
>>     Port 2:
>>         State: *Down*
>>         Physical state: *Polling*
>>         Rate: 40
>>         Base lid: 4660
>>         LMC: 0
>>         SM lid: 4660
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fed
>>         Link layer: InfiniBand
>>
>> Also on working hosts I only see devices from the local network, but
>> didn't see any of the blades hca connections.
>>
>>
>>
>>
>> *German*
>>
>>
>>
>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>
>> The screen shot looks good :-) SM brought the link up to active.
>>
>>
>>
>> Note that the ibportstate command you gave was for switch port 0 of the
>> Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>
>>
>>
>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <ganders at despegar.com>
>> wrote:
>>
>> Yes, find attached an screenshot of the port information (# 9) the one
>> that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also from one of
>> the hosts that are connected to one of the SX6018F I can see the 'remote'
>> HP IB SW:
>>
>> # *ibnodes*
>>
>> (...)
>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV Mellanox
>> Technologies" base port 0 *lid 29* lmc 0
>> Switch    : 0xe41d2d030031e9c1 ports 37 "MF0;GWIB01:SX6036G/U1" enhanced
>> port 0 lid 24 lmc 0
>> (...)
>>
>> # *ibportstate -L 29 query*
>> Switch PortInfo:
>> # Port info: Lid 29 port 0
>> LinkState:.......................Active
>> PhysLinkState:...................LinkUp
>> Lid:.............................29
>> SMLid:...........................2
>> LMC:.............................0
>> LinkWidthSupported:..............1X or 4X
>> LinkWidthEnabled:................1X or 4X
>> LinkWidthActive:.................4X
>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedActive:.................10.0 Gbps
>> Mkey:............................<not displayed>
>> MkeyLeasePeriod:.................0
>> ProtectBits:.....................0
>>
>>
>>
>>
>> *German*
>>
>>
>>
>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>
>> One more thing hopefully before playing with the low level phy settings:
>>
>>
>>
>> Are you using known good cables ? Do you have FDR cables on the FDR <->
>> FDR links ? Cable lengths can matter as well.
>>
>>
>>
>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <hal.rosenstock at gmail.com>
>> wrote:
>>
>> Were the ports mapped to the phy profile shutdown when you changed this ?
>>
>>
>>
>> LLR is a proprietary Mellanox mechanism.
>>
>>
>>
>> You might want 2 different profiles: one for the interfaces connected to
>> other gateway interfaces (which are FDR (and FDR-10) capable and the other
>> for the interfaces connecting to QDR (the older equipment in your network).
>> By configuring the Switch-X interfaces to the appropriate possible speeds
>> and disabling the proprietary mechanisms there, the link should not only
>> come up but also this will occur faster than if FDR/FDR10 are enabled.
>>
>>
>>
>> I suspect that due to the Switch-X configuration that the links to
>> the switch(es) in the HP enclosures do not negotiate properly (as shown by
>> down rather than LinkUp).
>>
>>
>>
>> Once you get all your links to INIT, negotiation has occurred and then
>> it's time for SM to bring links to active.
>>
>>
>>
>> Since you have down links, the SM can't do anything about those.
>>
>>
>>
>>
>>
>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <ganders at despegar.com>
>> wrote:
>>
>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I know that this
>> kind of SW does not come with an embedded sm, but I don't know how to
>> access any mgmt at all on this particularly switch, I mean for example to
>> setup speed or anything like that, is possible to access through the
>> chassis?
>>
>>
>> *German*
>>
>>
>>
>> 2015-10-07 13:19 GMT-03:00 German Anders <ganders at despegar.com>:
>>
>> I think so, but when trying to configured the phy-profile on the
>> interface in order to negotiate on QDR it failed to map the profile:
>>
>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>> high-speed-ber
>>
>>   Profile: high-speed-ber
>>   --------
>>   llr support ib-speed
>>   SDR: disable
>>   DDR: disable
>>   QDR: disable
>>   FDR10: enable-request
>>   FDR: enable-request
>>
>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile hp-encl-isl
>>
>>   Profile: hp-encl-isl
>>   --------
>>   llr support ib-speed
>>   SDR: disable
>>   DDR: disable
>>   QDR: enable
>>   FDR10: enable-request
>>   FDR: enable-request
>>
>> GWIB01 [proxy-ha-group: master] (config) #
>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9 phy-profile
>> map hp-encl-isl
>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>
>>
>> *German*
>>
>>
>>
>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>
>> The driver ‘qib’ is loading fine.  As can be seen by the ibstat output.
>> The ib_ipath is an older card.
>>
>>
>>
>> The problem is the link is not coming up to init.  Like Hal said the link
>> should transition to “link up” without the SMs involvement.
>>
>>
>>
>> I think you are on to something with the fact that it seems like your
>> switch ports are not configured to do QDR.
>>
>>
>>
>> Ira
>>
>>
>>
>>
>>
>> *From:* German Anders [mailto:ganders at despegar.com]
>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>> *To:* Weiny, Ira
>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>
>>
>> *Subject:* Re: [Users] IB topology config and polling state
>>
>>
>>
>> Yes I've that file:
>>
>> /usr/sbin/truescale-serdes.cmds
>>
>> Also I've done the install of libipathverbs:
>>
>> # apt-get install libipathverbs-dev
>>
>> But I try to load the ib_ipath module but I'm getting the following error
>> msg:
>>
>> # modprobe ib_ipath
>> modprobe: ERROR: could not insert 'ib_ipath': Device or resource busy
>>
>>
>> *German*
>>
>>
>>
>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>
>> There are a few issues for routing in that diagram but the links should
>> come up.
>>
>>
>>
>> I assume there is some backplane between the blade servers and the switch
>> in that chassis?
>>
>>
>>
>> Have you gotten libipathverbs installed?
>>
>>
>>
>> In ipathverbs there is a serdes tuning script.
>>
>>
>>
>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>
>>
>>
>> Does your libipathverbs include that file?  If not try the latest from
>> github.
>>
>>
>>
>> Ira
>>
>>
>>
>>
>>
>> *From:* users-bounces at lists.openfabrics.org [mailto:
>> users-bounces at lists.openfabrics.org] *On Behalf Of *German Anders
>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>> *To:* Hal Rosenstock
>> *Cc:* users at lists.openfabrics.org
>> *Subject:* Re: [Users] IB topology config and polling state
>>
>>
>>
>> Hi Hal,
>>
>> Thanks for the reply, I've attach a pdf with the diagram topology, I
>> don't know if this is the best way to go or if there's another way to
>> connect and setup the IB network, tips and suggestions will be very
>> appreciated, also the mezzanine cards are already installed on the blade
>> hosts:
>>
>> # lspci
>> (...)
>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)
>>
>>
>>
>> Thanks in advance,
>>
>> Cheers,
>>
>>
>> *German*
>>
>>
>>
>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>
>> Hi again German,
>>
>>
>>
>> Looks like you made some progress from yesterday as the qib ports are now
>> Polling rather than Disabled.
>>
>>
>>
>> But since they are Down, do you have them cabled to a switch ? That
>> should bring the links up and the port state will be Init. That is the
>> "starting" point.
>>
>>
>>
>> You will also then need to be running SM to bring the ports up to Active.
>>
>>
>>
>> -- Hal
>>
>>
>>
>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <ganders at despegar.com>
>> wrote:
>>
>> Hi all,
>>
>> I don't know if this is the mailist list for this kind of topic but I'm
>> really new to IB and I've just install two SX6036G gateways connected to
>> each other through two ISL ports, then I've configured a proxy-arp between
>> both nodes (sm is disable on both gw's):
>>
>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>
>> Load balancing algorithm: ib-base-ip
>> Number of Proxy-Arp interfaces: 1
>>
>> Proxy-ARP VIP
>> =============
>> Pra-group name: proxy-ha-group
>> HA VIP address: 10.xx.xx.xx/xx
>>
>> Active nodes:
>> ID                   State                IP
>> --------------------------------------------------------------
>> GWIB01               master               10.xx.xx.xx1
>> GWIB02               standby              10.xx.xx.xx2
>>
>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*), one connected
>> to GWIB01 and the other connected to GWIB02. The SM is configured locally
>> on both SWIB01 & SWIB02 switches. So far so good, after this config I setup
>> a commodity server with a MLNX IB ADPT FDR to the SWIB01 & SWIB02 switches,
>> config the drivers, etc and then get it up & running fine.
>>
>> Finally I've setup a HP Enclosure with an internal IB SW (then connect
>> port 1 of the internal SW to GWIB01 - link is up but LLR status is
>> inactive), install one of the blades and I see the following:
>>
>> # ibstat
>> CA 'qib0'
>>     CA type: InfiniPath_QMH7342
>>     Number of ports: 2
>>     Firmware version:
>>     Hardware version: 2
>>     Node GUID: 0x0011750000791fec
>>     System image GUID: 0x0011750000791fec
>>     Port 1:
>>         State: Down
>>         Physical state: Polling
>>         Rate: 40
>>         Base lid: 4660
>>         LMC: 0
>>         SM lid: 4660
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fec
>>         Link layer: InfiniBand
>>     Port 2:
>>         State: Down
>>         Physical state: Polling
>>         Rate: 40
>>         Base lid: 4660
>>         LMC: 0
>>         SM lid: 4660
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fed
>>         Link layer: InfiniBand
>>
>> So I was wondering if maybe the SM is not being recognized on the Blade
>> system and that's why is not passing the Polling state, is that possible?
>> Or maybe is not possible to connect an ISL between the GW and the HP
>> internal SW so that the sm is available or maybe the inactive LLR is
>> causing this thing, any ideas? I thought about connecting the ISL of the
>> HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I don't have any
>> available ports.
>>
>> Thanks in advance,
>>
>> Cheers,
>>
>>
>> *German*
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.openfabrics.org
>> http://lists.openfabrics.org/mailman/listinfo/users
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/5b0e82b2/attachment.html>


More information about the Users mailing list