[Users] IB topology config and polling state

Weiny, Ira ira.weiny at intel.com
Mon Oct 12 22:14:27 PDT 2015


German,

Do you have any documentation on the HP blade system?  And the switch which is in that system?

I have to admit I have not followed everything in this thread regarding your configuration but it seems like you have some mellanox switches connected into an HP chassis which has both a switch and blades with qib (Truescale) cards.

The connection from the mellanox switch to the “HP chassis switch” is linkup (active) but the connections to the individual qib HCAs are not even linkup.

Is that a correct summary of the problem?

If so here are some questions:


1)      What type of switch is in the HP chassis?

2)      Do you have console access or http access to that switch?

3)      Does that switch have an SM in it?

4)      What version of the kernel are you running with the qib cards?

a.       I assume you are using the qib driver in that kernel.

At some point Hal spoke of “LLR being a Mellanox thing”  Was that to solve the problem of connecting the “HP switch” to the Mellanox switch?

I would like it if you could verify that the

/usr/sbin/truescale-serdes.cmds

Is being run?

Also what version of libipathverbs do you have?

Ira

From: users-bounces at lists.openfabrics.org [mailto:users-bounces at lists.openfabrics.org] On Behalf Of Weiny, Ira
Sent: Wednesday, October 07, 2015 1:31 PM
To: Hal Rosenstock; German Anders
Cc: users at lists.openfabrics.org
Subject: Re: [Users] IB topology config and polling state

Agree with Hal here.

I’m not familiar with those blades/switches.  I’ll ask around.

Ira

From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com]
Sent: Wednesday, October 07, 2015 1:26 PM
To: German Anders
Cc: Weiny, Ira; users at lists.openfabrics.org<mailto:users at lists.openfabrics.org>
Subject: Re: [Users] IB topology config and polling state

That's the gateway to the switch in the enclosure. It's the internal connectivity in the blade enclosure that's (physically) broken.

On Wed, Oct 7, 2015 at 4:24 PM, German Anders <ganders at despegar.com<mailto:ganders at despegar.com>> wrote:
cabled
the blade it's:

vendid=0x2c9
devid=0xbd36
sysimgguid=0x2c902004b0918
switchguid=0x2c902004b0918(2c902004b0918)
Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV Mellanox Technologies" base port 0 lid 29 lmc 0
[1]    "S-e41d2d030031e9c1"[9]        # "MF0;GWIB01:SX6036G/U1" lid 24 4xQDR

German

2015-10-07 17:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>>:
What are those HCAs cabled to or is it internal to the blade enclosure ?

On Wed, Oct 7, 2015 at 3:24 PM, German Anders <ganders at despegar.com<mailto:ganders at despegar.com>> wrote:
Yeah, is there any command that I can run in order to change the port state on the remote switch? I mean everything looks good but in the hp blades still getting:


# ibstat
CA 'qib0'
    CA type: InfiniPath_QMH7342
    Number of ports: 2
    Firmware version:
    Hardware version: 2
    Node GUID: 0x0011750000791fec
    System image GUID: 0x0011750000791fec
    Port 1:
        State: Down
        Physical state: Polling
        Rate: 40
        Base lid: 4660
        LMC: 0
        SM lid: 4660
        Capability mask: 0x0761086a
        Port GUID: 0x0011750000791fec
        Link layer: InfiniBand
    Port 2:
        State: Down
        Physical state: Polling
        Rate: 40
        Base lid: 4660
        LMC: 0
        SM lid: 4660
        Capability mask: 0x0761086a
        Port GUID: 0x0011750000791fed
        Link layer: InfiniBand
Also on working hosts I only see devices from the local network, but didn't see any of the blades hca connections.


German

2015-10-07 16:21 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>>:
The screen shot looks good :-) SM brought the link up to active.

Note that the ibportstate command you gave was for switch port 0 of the Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.

On Wed, Oct 7, 2015 at 3:06 PM, German Anders <ganders at despegar.com<mailto:ganders at despegar.com>> wrote:
Yes, find attached an screenshot of the port information (# 9) the one that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also from one of the hosts that are connected to one of the SX6018F I can see the 'remote' HP IB SW:
# ibnodes

(...)
Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV Mellanox Technologies" base port 0 lid 29 lmc 0
Switch    : 0xe41d2d030031e9c1 ports 37 "MF0;GWIB01:SX6036G/U1" enhanced port 0 lid 24 lmc 0
(...)

# ibportstate -L 29 query
Switch PortInfo:
# Port info: Lid 29 port 0
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................29
SMLid:...........................2
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0


German

2015-10-07 16:00 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>>:
One more thing hopefully before playing with the low level phy settings:

Are you using known good cables ? Do you have FDR cables on the FDR <-> FDR links ? Cable lengths can matter as well.

On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>> wrote:
Were the ports mapped to the phy profile shutdown when you changed this ?

LLR is a proprietary Mellanox mechanism.

You might want 2 different profiles: one for the interfaces connected to other gateway interfaces (which are FDR (and FDR-10) capable and the other for the interfaces connecting to QDR (the older equipment in your network). By configuring the Switch-X interfaces to the appropriate possible speeds and disabling the proprietary mechanisms there, the link should not only come up but also this will occur faster than if FDR/FDR10 are enabled.

I suspect that due to the Switch-X configuration that the links to the switch(es) in the HP enclosures do not negotiate properly (as shown by down rather than LinkUp).

Once you get all your links to INIT, negotiation has occurred and then it's time for SM to bring links to active.

Since you have down links, the SM can't do anything about those.


On Wed, Oct 7, 2015 at 12:44 PM, German Anders <ganders at despegar.com<mailto:ganders at despegar.com>> wrote:
Anyone had any experience with HP BLc 4X QDR IB Switch?? I know that this kind of SW does not come with an embedded sm, but I don't know how to access any mgmt at all on this particularly switch, I mean for example to setup speed or anything like that, is possible to access through the chassis?

German

2015-10-07 13:19 GMT-03:00 German Anders <ganders at despegar.com<mailto:ganders at despegar.com>>:
I think so, but when trying to configured the phy-profile on the interface in order to negotiate on QDR it failed to map the profile:

GWIB01 [proxy-ha-group: master] (config) # show phy-profile high-speed-ber

  Profile: high-speed-ber
  --------
  llr support ib-speed
  SDR: disable
  DDR: disable
  QDR: disable
  FDR10: enable-request
  FDR: enable-request

GWIB01 [proxy-ha-group: master] (config) # show phy-profile hp-encl-isl

  Profile: hp-encl-isl
  --------
  llr support ib-speed
  SDR: disable
  DDR: disable
  QDR: enable
  FDR10: enable-request
  FDR: enable-request

GWIB01 [proxy-ha-group: master] (config) #
GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9 phy-profile map hp-encl-isl
% Cannot map profile hp-encl-isl to port:  1/9

German

2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com<mailto:ira.weiny at intel.com>>:
The driver ‘qib’ is loading fine.  As can be seen by the ibstat output.  The ib_ipath is an older card.

The problem is the link is not coming up to init.  Like Hal said the link should transition to “link up” without the SMs involvement.

I think you are on to something with the fact that it seems like your switch ports are not configured to do QDR.

Ira


From: German Anders [mailto:ganders at despegar.com<mailto:ganders at despegar.com>]
Sent: Wednesday, October 07, 2015 9:05 AM
To: Weiny, Ira
Cc: Hal Rosenstock; users at lists.openfabrics.org<mailto:users at lists.openfabrics.org>

Subject: Re: [Users] IB topology config and polling state

Yes I've that file:

/usr/sbin/truescale-serdes.cmds
Also I've done the install of libipathverbs:
# apt-get install libipathverbs-dev
But I try to load the ib_ipath module but I'm getting the following error msg:

# modprobe ib_ipath
modprobe: ERROR: could not insert 'ib_ipath': Device or resource busy

German

2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com<mailto:ira.weiny at intel.com>>:
There are a few issues for routing in that diagram but the links should come up.

I assume there is some backplane between the blade servers and the switch in that chassis?

Have you gotten libipathverbs installed?

In ipathverbs there is a serdes tuning script.

https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds

Does your libipathverbs include that file?  If not try the latest from github.

Ira


From: users-bounces at lists.openfabrics.org<mailto:users-bounces at lists.openfabrics.org> [mailto:users-bounces at lists.openfabrics.org<mailto:users-bounces at lists.openfabrics.org>] On Behalf Of German Anders
Sent: Wednesday, October 07, 2015 8:41 AM
To: Hal Rosenstock
Cc: users at lists.openfabrics.org<mailto:users at lists.openfabrics.org>
Subject: Re: [Users] IB topology config and polling state

Hi Hal,
Thanks for the reply, I've attach a pdf with the diagram topology, I don't know if this is the best way to go or if there's another way to connect and setup the IB network, tips and suggestions will be very appreciated, also the mezzanine cards are already installed on the blade hosts:
# lspci
(...)
41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)

Thanks in advance,
Cheers,

German

2015-10-07 11:47 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>>:
Hi again German,

Looks like you made some progress from yesterday as the qib ports are now Polling rather than Disabled.

But since they are Down, do you have them cabled to a switch ? That should bring the links up and the port state will be Init. That is the "starting" point.

You will also then need to be running SM to bring the ports up to Active.

-- Hal

On Wed, Oct 7, 2015 at 10:37 AM, German Anders <ganders at despegar.com<mailto:ganders at despegar.com>> wrote:
Hi all,
I don't know if this is the mailist list for this kind of topic but I'm really new to IB and I've just install two SX6036G gateways connected to each other through two ISL ports, then I've configured a proxy-arp between both nodes (sm is disable on both gw's):

GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha

Load balancing algorithm: ib-base-ip
Number of Proxy-Arp interfaces: 1

Proxy-ARP VIP
=============
Pra-group name: proxy-ha-group
HA VIP address: 10.xx.xx.xx/xx

Active nodes:
ID                   State                IP
--------------------------------------------------------------
GWIB01               master               10.xx.xx.xx1
GWIB02               standby              10.xx.xx.xx2
Then I setup two SX6018F switches (SWIB01 and SWIB02), one connected to GWIB01 and the other connected to GWIB02. The SM is configured locally on both SWIB01 & SWIB02 switches. So far so good, after this config I setup a commodity server with a MLNX IB ADPT FDR to the SWIB01 & SWIB02 switches, config the drivers, etc and then get it up & running fine.

Finally I've setup a HP Enclosure with an internal IB SW (then connect port 1 of the internal SW to GWIB01 - link is up but LLR status is inactive), install one of the blades and I see the following:

# ibstat
CA 'qib0'
    CA type: InfiniPath_QMH7342
    Number of ports: 2
    Firmware version:
    Hardware version: 2
    Node GUID: 0x0011750000791fec
    System image GUID: 0x0011750000791fec
    Port 1:
        State: Down
        Physical state: Polling
        Rate: 40
        Base lid: 4660
        LMC: 0
        SM lid: 4660
        Capability mask: 0x0761086a
        Port GUID: 0x0011750000791fec
        Link layer: InfiniBand
    Port 2:
        State: Down
        Physical state: Polling
        Rate: 40
        Base lid: 4660
        LMC: 0
        SM lid: 4660
        Capability mask: 0x0761086a
        Port GUID: 0x0011750000791fed
        Link layer: InfiniBand
So I was wondering if maybe the SM is not being recognized on the Blade system and that's why is not passing the Polling state, is that possible? Or maybe is not possible to connect an ISL between the GW and the HP internal SW so that the sm is available or maybe the inactive LLR is causing this thing, any ideas? I thought about connecting the ISL of the HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I don't have any available ports.
Thanks in advance,
Cheers,

German

_______________________________________________
Users mailing list
Users at lists.openfabrics.org<mailto:Users at lists.openfabrics.org>
http://lists.openfabrics.org/mailman/listinfo/users













-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/8c35f506/attachment.html>


More information about the Users mailing list