[Users] IB topology config and polling state

German Anders ganders at despegar.com
Tue Oct 13 05:28:36 PDT 2015


Hi Hal,

It does not allow me to setup an IP ADDR to the Internal SW so I can't
access from outside, except from the tools that I mentioned before, also it
doesn't allow me to access through serial connection from inside the
enclosure. I've attach some screen-shots about the connectivity.



*German*

2015-10-13 9:13 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:

> Hi German,
>
> Are the cards in the correct bays and slots ?
>
> Do you have the HP Onboard Administrator tool ? What does it say about
> internal connectivity ?
>
> -- Hal
>
>
>
> On Tue, Oct 13, 2015 at 7:44 AM, German Anders <ganders at despegar.com>
> wrote:
>
>> Hi Ira,
>>
>> I've some HP documentation but it quite short, also it doesn't describe
>> any 'config' or 'impl' steps in order to get the internal switch up and
>> running. The version of SW that came with enclosure does not had any
>> management module at all, so it depends on external management. During
>> weekend I've found a way to upgrade the firmware of the HP switch with the
>> following command (mlxburn -d lid-0x001D -fw fw-IS4.mlx), and the I've run
>> (flint -d /dev/mst/SW_MT48438_0x2c902004b0918_lid-0x001D dc
>> /home/ceph/HPIBSW.INI) and found the following inside that file:
>>
>> [PS_INFO]
>> Name = 489184-B21
>> Description = HP BLc 4X QDR IB Switch
>>
>> [ADAPTER]
>> PSID = HP_0100000009
>>
>> (...)
>>
>> [IB_TO_HW_MAP]
>> PORT1=14
>> PORT2=15
>> PORT3=16
>> PORT4=17
>> PORT5=18
>> PORT6=12
>> PORT7=11
>> PORT8=10
>> PORT9=9
>> PORT10=8
>> PORT11=7
>> PORT12=6
>> PORT13=5
>> PORT14=4
>> PORT15=3
>> PORT16=2
>> PORT17=20
>> PORT18=22
>> PORT19=24
>>
>> PORT20=26
>> PORT21=28
>> PORT22=30
>> PORT23=35
>> PORT24=33
>> PORT25=21
>> PORT26=23
>> PORT27=25
>> PORT28=27
>> PORT29=29
>> PORT30=36
>> PORT31=34
>> PORT32=32
>> PORT33=1
>> PORT34=13
>> PORT35=19
>> PORT36=31
>>
>> [unused_ports]
>> hw_port1_not_in_use=1
>> hw_port13_not_in_use=1
>> hw_port19_not_in_use=1
>> hw_port31_not_in_use=1
>>
>> (...)
>>
>> I don't know if maybe there's some issue with the port mapping, anyone
>> had used this kind of switch?
>>
>> The summary of the problem is correct, the connectivity between the IB
>> network (MLNX switches/gw) and the HP IB switch is working since I was able
>> to upgrade the firmare of the switch and get information about it. But, the
>> connection between the mezzanine cards of the blades and the internal IB sw
>> enclosure is not working at all. Note, that if I go to the OA
>> administration of the enclosure I can see the 'green' ports mapping of each
>> of the blades and the interconnection switch, so I'm guessing that it
>> should be working.
>>
>> Regarding the questions:
>>
>> 1)      What type of switch is in the HP chassis?
>>
>>
>> *QLogic HP BLc 4X QDR IB Switch*
>>
>> *PSID = HP_0100000009*
>>
>> *Image type:   FS2*
>>
>> *FW ver:         7.4.3000*
>>
>> *Device ID:     48438*
>> *GUI:              0002c902004b0918*
>>
>> 2)      Do you have console access or http access to that switch?
>>
>> *No, since it didn't had any manage module mezzanine card inside the
>> switch, it only come with a i2c port. But, i can have access through the
>> mlxburn and flint tools from one host that's connected to the ib network
>> (outside the enclosure).*
>>
>> 3)      Does that switch have an SM in it?
>>
>> *No*
>>
>> 4)      What version of the kernel are you running with the qib cards?
>>
>> a.       I assume you are using the qib driver in that kernel.
>>
>> *Ubuntu 14.04.3 LTS - kernel 3.18.20-031820-generic*
>>
>>
>>
>> At some point Hal spoke of “LLR being a Mellanox thing”  Was that to
>> solve the problem of connecting the “HP switch” to the Mellanox switch?
>>
>>
>>
>> *No, since LLR is only supported between mlnx devices, the ISL are up and
>> working, since it's possible for me to query the switch*
>>
>>
>>
>> I would like it if you could verify that the
>>
>>
>>
>> /usr/sbin/truescale-serdes.cmds
>>
>>
>>
>> Is being run?
>>
>>
>> *When trying to run the command:*
>>
>>
>>
>> *# /usr/sbin/truescale-serdes.cmds/usr/sbin/truescale-serdes.cmds: 100:
>> /usr/sbin/truescale-serdes.cmds: Syntax error: "(" unexpected (expecting
>> "}")*
>>
>>
>>
>> Also what version of libipathverbs do you have?
>>
>>
>>
>>
>> *# rpm -qa | grep libipathverbslibipathverbs-1.3-1.x86_64*
>> Thanks in advance,
>>
>> Cheers,
>>
>>
>>
>> *German*
>> 2015-10-13 2:14 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>
>>> German,
>>>
>>>
>>>
>>> Do you have any documentation on the HP blade system?  And the switch
>>> which is in that system?
>>>
>>>
>>>
>>> I have to admit I have not followed everything in this thread regarding
>>> your configuration but it seems like you have some mellanox switches
>>> connected into an HP chassis which has both a switch and blades with qib
>>> (Truescale) cards.
>>>
>>>
>>>
>>> The connection from the mellanox switch to the “HP chassis switch” is
>>> linkup (active) but the connections to the individual qib HCAs are not even
>>> linkup.
>>>
>>>
>>>
>>> Is that a correct summary of the problem?
>>>
>>>
>>>
>>> If so here are some questions:
>>>
>>>
>>>
>>> 1)      What type of switch is in the HP chassis?
>>>
>>> 2)      Do you have console access or http access to that switch?
>>>
>>> 3)      Does that switch have an SM in it?
>>>
>>> 4)      What version of the kernel are you running with the qib cards?
>>>
>>> a.       I assume you are using the qib driver in that kernel.
>>>
>>>
>>>
>>> At some point Hal spoke of “LLR being a Mellanox thing”  Was that to
>>> solve the problem of connecting the “HP switch” to the Mellanox switch?
>>>
>>>
>>>
>>> I would like it if you could verify that the
>>>
>>>
>>>
>>> /usr/sbin/truescale-serdes.cmds
>>>
>>>
>>>
>>> Is being run?
>>>
>>>
>>>
>>> Also what version of libipathverbs do you have?
>>>
>>>
>>>
>>> Ira
>>>
>>>
>>>
>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>> users-bounces at lists.openfabrics.org] *On Behalf Of *Weiny, Ira
>>> *Sent:* Wednesday, October 07, 2015 1:31 PM
>>> *To:* Hal Rosenstock; German Anders
>>>
>>> *Cc:* users at lists.openfabrics.org
>>> *Subject:* Re: [Users] IB topology config and polling state
>>>
>>>
>>>
>>> Agree with Hal here.
>>>
>>>
>>>
>>> I’m not familiar with those blades/switches.  I’ll ask around.
>>>
>>>
>>>
>>> Ira
>>>
>>>
>>>
>>> *From:* Hal Rosenstock [mailto:hal.rosenstock at gmail.com
>>> <hal.rosenstock at gmail.com>]
>>> *Sent:* Wednesday, October 07, 2015 1:26 PM
>>> *To:* German Anders
>>> *Cc:* Weiny, Ira; users at lists.openfabrics.org
>>> *Subject:* Re: [Users] IB topology config and polling state
>>>
>>>
>>>
>>> That's the gateway to the switch in the enclosure. It's the internal
>>> connectivity in the blade enclosure that's (physically) broken.
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 4:24 PM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>> cabled
>>>
>>> the blade it's:
>>>
>>> vendid=0x2c9
>>> devid=0xbd36
>>> sysimgguid=0x2c902004b0918
>>> switchguid=0x2c902004b0918(2c902004b0918)
>>> Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV Mellanox
>>> Technologies" base port 0 *lid 29* lmc 0
>>> [1]    "S-e41d2d030031e9c1"[9]        # "MF0;GWIB01:SX6036G/U1" lid 24
>>> 4xQDR
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2015-10-07 17:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>
>>> What are those HCAs cabled to or is it internal to the blade enclosure ?
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 3:24 PM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>> Yeah, is there any command that I can run in order to change the port
>>> state on the remote switch? I mean everything looks good but in the hp
>>> blades still getting:
>>>
>>>
>>>
>>> # ibstat
>>> CA 'qib0'
>>>     CA type: InfiniPath_QMH7342
>>>     Number of ports: 2
>>>     Firmware version:
>>>     Hardware version: 2
>>>     Node GUID: 0x0011750000791fec
>>>     System image GUID: 0x0011750000791fec
>>>     Port 1:
>>>         State: *Down*
>>>         Physical state: *Polling*
>>>         Rate: 40
>>>         Base lid: 4660
>>>         LMC: 0
>>>         SM lid: 4660
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fec
>>>         Link layer: InfiniBand
>>>     Port 2:
>>>         State: *Down*
>>>         Physical state: *Polling*
>>>         Rate: 40
>>>         Base lid: 4660
>>>         LMC: 0
>>>         SM lid: 4660
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fed
>>>         Link layer: InfiniBand
>>>
>>> Also on working hosts I only see devices from the local network, but
>>> didn't see any of the blades hca connections.
>>>
>>>
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>
>>> The screen shot looks good :-) SM brought the link up to active.
>>>
>>>
>>>
>>> Note that the ibportstate command you gave was for switch port 0 of the
>>> Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>> Yes, find attached an screenshot of the port information (# 9) the one
>>> that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also from one of
>>> the hosts that are connected to one of the SX6018F I can see the 'remote'
>>> HP IB SW:
>>>
>>> # *ibnodes*
>>>
>>> (...)
>>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV Mellanox
>>> Technologies" base port 0 *lid 29* lmc 0
>>> Switch    : 0xe41d2d030031e9c1 ports 37 "MF0;GWIB01:SX6036G/U1" enhanced
>>> port 0 lid 24 lmc 0
>>> (...)
>>>
>>> # *ibportstate -L 29 query*
>>> Switch PortInfo:
>>> # Port info: Lid 29 port 0
>>> LinkState:.......................Active
>>> PhysLinkState:...................LinkUp
>>> Lid:.............................29
>>> SMLid:...........................2
>>> LMC:.............................0
>>> LinkWidthSupported:..............1X or 4X
>>> LinkWidthEnabled:................1X or 4X
>>> LinkWidthActive:.................4X
>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedActive:.................10.0 Gbps
>>> Mkey:............................<not displayed>
>>> MkeyLeasePeriod:.................0
>>> ProtectBits:.....................0
>>>
>>>
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>
>>> One more thing hopefully before playing with the low level phy settings:
>>>
>>>
>>>
>>> Are you using known good cables ? Do you have FDR cables on the FDR <->
>>> FDR links ? Cable lengths can matter as well.
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>> hal.rosenstock at gmail.com> wrote:
>>>
>>> Were the ports mapped to the phy profile shutdown when you changed this ?
>>>
>>>
>>>
>>> LLR is a proprietary Mellanox mechanism.
>>>
>>>
>>>
>>> You might want 2 different profiles: one for the interfaces connected to
>>> other gateway interfaces (which are FDR (and FDR-10) capable and the other
>>> for the interfaces connecting to QDR (the older equipment in your network).
>>> By configuring the Switch-X interfaces to the appropriate possible speeds
>>> and disabling the proprietary mechanisms there, the link should not only
>>> come up but also this will occur faster than if FDR/FDR10 are enabled.
>>>
>>>
>>>
>>> I suspect that due to the Switch-X configuration that the links to
>>> the switch(es) in the HP enclosures do not negotiate properly (as shown by
>>> down rather than LinkUp).
>>>
>>>
>>>
>>> Once you get all your links to INIT, negotiation has occurred and then
>>> it's time for SM to bring links to active.
>>>
>>>
>>>
>>> Since you have down links, the SM can't do anything about those.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I know that
>>> this kind of SW does not come with an embedded sm, but I don't know how to
>>> access any mgmt at all on this particularly switch, I mean for example to
>>> setup speed or anything like that, is possible to access through the
>>> chassis?
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2015-10-07 13:19 GMT-03:00 German Anders <ganders at despegar.com>:
>>>
>>> I think so, but when trying to configured the phy-profile on the
>>> interface in order to negotiate on QDR it failed to map the profile:
>>>
>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>> high-speed-ber
>>>
>>>   Profile: high-speed-ber
>>>   --------
>>>   llr support ib-speed
>>>   SDR: disable
>>>   DDR: disable
>>>   QDR: disable
>>>   FDR10: enable-request
>>>   FDR: enable-request
>>>
>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile hp-encl-isl
>>>
>>>   Profile: hp-encl-isl
>>>   --------
>>>   llr support ib-speed
>>>   SDR: disable
>>>   DDR: disable
>>>   QDR: enable
>>>   FDR10: enable-request
>>>   FDR: enable-request
>>>
>>> GWIB01 [proxy-ha-group: master] (config) #
>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9 phy-profile
>>> map hp-encl-isl
>>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>
>>> The driver ‘qib’ is loading fine.  As can be seen by the ibstat output.
>>> The ib_ipath is an older card.
>>>
>>>
>>>
>>> The problem is the link is not coming up to init.  Like Hal said the
>>> link should transition to “link up” without the SMs involvement.
>>>
>>>
>>>
>>> I think you are on to something with the fact that it seems like your
>>> switch ports are not configured to do QDR.
>>>
>>>
>>>
>>> Ira
>>>
>>>
>>>
>>>
>>>
>>> *From:* German Anders [mailto:ganders at despegar.com]
>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>> *To:* Weiny, Ira
>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>
>>>
>>> *Subject:* Re: [Users] IB topology config and polling state
>>>
>>>
>>>
>>> Yes I've that file:
>>>
>>> /usr/sbin/truescale-serdes.cmds
>>>
>>> Also I've done the install of libipathverbs:
>>>
>>> # apt-get install libipathverbs-dev
>>>
>>> But I try to load the ib_ipath module but I'm getting the following
>>> error msg:
>>>
>>> # modprobe ib_ipath
>>> modprobe: ERROR: could not insert 'ib_ipath': Device or resource busy
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>
>>> There are a few issues for routing in that diagram but the links should
>>> come up.
>>>
>>>
>>>
>>> I assume there is some backplane between the blade servers and the
>>> switch in that chassis?
>>>
>>>
>>>
>>> Have you gotten libipathverbs installed?
>>>
>>>
>>>
>>> In ipathverbs there is a serdes tuning script.
>>>
>>>
>>>
>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>
>>>
>>>
>>> Does your libipathverbs include that file?  If not try the latest from
>>> github.
>>>
>>>
>>>
>>> Ira
>>>
>>>
>>>
>>>
>>>
>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German Anders
>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>> *To:* Hal Rosenstock
>>> *Cc:* users at lists.openfabrics.org
>>> *Subject:* Re: [Users] IB topology config and polling state
>>>
>>>
>>>
>>> Hi Hal,
>>>
>>> Thanks for the reply, I've attach a pdf with the diagram topology, I
>>> don't know if this is the best way to go or if there's another way to
>>> connect and setup the IB network, tips and suggestions will be very
>>> appreciated, also the mezzanine cards are already installed on the blade
>>> hosts:
>>>
>>> # lspci
>>> (...)
>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)
>>>
>>>
>>>
>>> Thanks in advance,
>>>
>>> Cheers,
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>
>>> Hi again German,
>>>
>>>
>>>
>>> Looks like you made some progress from yesterday as the qib ports are
>>> now Polling rather than Disabled.
>>>
>>>
>>>
>>> But since they are Down, do you have them cabled to a switch ? That
>>> should bring the links up and the port state will be Init. That is the
>>> "starting" point.
>>>
>>>
>>>
>>> You will also then need to be running SM to bring the ports up to Active.
>>>
>>>
>>>
>>> -- Hal
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I don't know if this is the mailist list for this kind of topic but I'm
>>> really new to IB and I've just install two SX6036G gateways connected to
>>> each other through two ISL ports, then I've configured a proxy-arp between
>>> both nodes (sm is disable on both gw's):
>>>
>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>
>>> Load balancing algorithm: ib-base-ip
>>> Number of Proxy-Arp interfaces: 1
>>>
>>> Proxy-ARP VIP
>>> =============
>>> Pra-group name: proxy-ha-group
>>> HA VIP address: 10.xx.xx.xx/xx
>>>
>>> Active nodes:
>>> ID                   State                IP
>>> --------------------------------------------------------------
>>> GWIB01               master               10.xx.xx.xx1
>>> GWIB02               standby              10.xx.xx.xx2
>>>
>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*), one
>>> connected to GWIB01 and the other connected to GWIB02. The SM is configured
>>> locally on both SWIB01 & SWIB02 switches. So far so good, after this config
>>> I setup a commodity server with a MLNX IB ADPT FDR to the SWIB01 & SWIB02
>>> switches, config the drivers, etc and then get it up & running fine.
>>>
>>> Finally I've setup a HP Enclosure with an internal IB SW (then connect
>>> port 1 of the internal SW to GWIB01 - link is up but LLR status is
>>> inactive), install one of the blades and I see the following:
>>>
>>> # ibstat
>>> CA 'qib0'
>>>     CA type: InfiniPath_QMH7342
>>>     Number of ports: 2
>>>     Firmware version:
>>>     Hardware version: 2
>>>     Node GUID: 0x0011750000791fec
>>>     System image GUID: 0x0011750000791fec
>>>     Port 1:
>>>         State: Down
>>>         Physical state: Polling
>>>         Rate: 40
>>>         Base lid: 4660
>>>         LMC: 0
>>>         SM lid: 4660
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fec
>>>         Link layer: InfiniBand
>>>     Port 2:
>>>         State: Down
>>>         Physical state: Polling
>>>         Rate: 40
>>>         Base lid: 4660
>>>         LMC: 0
>>>         SM lid: 4660
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fed
>>>         Link layer: InfiniBand
>>>
>>> So I was wondering if maybe the SM is not being recognized on the Blade
>>> system and that's why is not passing the Polling state, is that possible?
>>> Or maybe is not possible to connect an ISL between the GW and the HP
>>> internal SW so that the sm is available or maybe the inactive LLR is
>>> causing this thing, any ideas? I thought about connecting the ISL of
>>> the HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I don't have
>>> any available ports.
>>>
>>> Thanks in advance,
>>>
>>> Cheers,
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.openfabrics.org
>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/c9bc5efb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HPOA_4.png
Type: image/png
Size: 137843 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/c9bc5efb/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HPOA_3.png
Type: image/png
Size: 142383 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/c9bc5efb/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HPOA_2.png
Type: image/png
Size: 125867 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/c9bc5efb/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HPOA_1.png
Type: image/png
Size: 76855 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/c9bc5efb/attachment-0003.png>


More information about the Users mailing list