[Users] IB topology config and polling state

German Anders ganders at despegar.com
Tue Oct 13 10:58:41 PDT 2015


I see many errors increasing:

# perfquery 29 28
# Port counters: Lid 29 port 28 (CapMask: 0x1B01)
PortSelect:......................28
CounterSelect:...................0x0000
SymbolErrorCounter:..............65535
LinkErrorRecoveryCounter:........0
LinkDownedCounter:...............0
PortRcvErrors:...................20904
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................166
PortRcvData:.....................167
PortXmitPkts:....................3
PortRcvPkts:.....................3
PortXmitWait:....................0

# perfquery 29 28
# Port counters: Lid 29 port 28 (CapMask: 0x1B01)
PortSelect:......................28
CounterSelect:...................0x0000
SymbolErrorCounter:..............65535
LinkErrorRecoveryCounter:........4
LinkDownedCounter:...............0
PortRcvErrors:...................37474
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................260
PortRcvData:.....................262
PortXmitPkts:....................5
PortRcvPkts:.....................5
PortXmitWait:....................0

# perfquery 29 28
# Port counters: Lid 29 port 28 (CapMask: 0x1B01)
PortSelect:......................28
CounterSelect:...................0x0000
SymbolErrorCounter:..............65535
LinkErrorRecoveryCounter:........4
LinkDownedCounter:...............0
PortRcvErrors:...................60240
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................449
PortRcvData:.....................452
PortXmitPkts:....................9
PortRcvPkts:.....................9
PortXmitWait:....................0

*after changing the speed port to 7:*

# perfquery 29 28
# Port counters: Lid 29 port 28 (CapMask: 0x1B01)
PortSelect:......................28
CounterSelect:...................0x0000
SymbolErrorCounter:..............65535
LinkErrorRecoveryCounter:........2
LinkDownedCounter:...............0
PortRcvErrors:...................1568
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................0
PortRcvData:.....................0
PortXmitPkts:....................0
PortRcvPkts:.....................0
PortXmitWait:....................0

# perfquery 29 28
# Port counters: Lid 29 port 28 (CapMask: 0x1B01)
PortSelect:......................28
CounterSelect:...................0x0000
SymbolErrorCounter:..............65535
LinkErrorRecoveryCounter:........2
LinkDownedCounter:...............0
PortRcvErrors:...................11356
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................166
PortRcvData:.....................167
PortXmitPkts:....................3
PortRcvPkts:.....................3
PortXmitWait:....................0

# perfquery 29 28
# Port counters: Lid 29 port 28 (CapMask: 0x1B01)
PortSelect:......................28
CounterSelect:...................0x0000
SymbolErrorCounter:..............65535
LinkErrorRecoveryCounter:........3
LinkDownedCounter:...............0
PortRcvErrors:...................44675
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................261
PortRcvData:.....................284
PortXmitPkts:....................5
PortRcvPkts:.....................6
PortXmitWait:....................0



*German* <ganders at despegar.com>

2015-10-13 14:26 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:

> Speed 4 is not valid as an enabled speed. It probably is speed 7 (SDR,
> DDR, or QDR) . Since qib is QDR (only), espeed shouldn't be used. QDR only
> is non standard which is why it says "IBA extension".
>
> As to why the link comes up and then goes down, there are a number of
> possible reasons. What does perfquery 29 28 say ? Are there some error
> counters increasing there ? If so, which ones ?
>
> On Tue, Oct 13, 2015 at 1:09 PM, German Anders <ganders at despegar.com>
> wrote:
>
>> Ok, some good news! I've find the way to get LinkUp on the blade port...
>> but.. not for so long :S, it seems that after making the following changes
>> the ports at the end come up...but then in a couple of minutes goes down
>> again:
>>
>> - Move the Encl IB SW to Bay7
>> - Move the Blade Mezz cards to slot 3
>>
>> Then for ex:
>>
>> Blade IB SW - LID: 29
>> Blade    4    IB-SW-PORT    28
>>
>> 1)
>>
>> # ibportstate -L 29 28
>> Switch PortInfo:
>> # Port info: Lid 29 port 28
>> LinkState:.......................Down
>> PhysLinkState:...................Polling
>> Lid:.............................75
>> SMLid:...........................2328
>> LMC:.............................0
>> LinkWidthSupported:..............1X or 4X
>> LinkWidthEnabled:................1X or 4X
>> LinkWidthActive:.................4X
>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedActive:.................2.5 Gbps
>>
>>
>> 2)
>>
>> # ibportstate -L 29 28 disable
>> # ibportstate -L 29 28 speed 4
>> # ibportstate -L 29 28 espeed 4
>> # ibportstate -L 29 28 smlid 2
>> # ibportstate -L 29 28 enable
>>
>> 3)
>>
>> # ibportstate -L 29 28
>> Switch PortInfo:
>> # Port info: Lid 29 port 28
>> LinkState:.......................Active
>> PhysLinkState:...................LinkUp
>> Lid:.............................75
>> SMLid:...........................2328
>> LMC:.............................0
>> LinkWidthSupported:..............1X or 4X
>> LinkWidthEnabled:................1X or 4X
>> LinkWidthActive:.................4X
>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedActive:.................10.0 Gbps
>> Peer PortInfo:
>> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,28 port 1
>> LinkState:.......................Active
>> PhysLinkState:...................LinkUp
>> Lid:.............................30
>> SMLid:...........................2
>> LMC:.............................0
>> LinkWidthSupported:..............1X or 4X
>> LinkWidthEnabled:................1X or 4X
>> LinkWidthActive:.................4X
>> LinkSpeedSupported:..............10.0 Gbps (IBA extension)
>> LinkSpeedEnabled:................10.0 Gbps (IBA extension)
>> LinkSpeedActive:.................10.0 Gbps
>> Mkey:............................<not displayed>
>> MkeyLeasePeriod:.................0
>> ProtectBits:.....................0
>>
>>
>> # ibstat
>> CA 'qib0'
>>     CA type: InfiniPath_QMH7342
>>     Number of ports: 2
>>     Firmware version:
>>     Hardware version: 2
>>     Node GUID: 0x0011750000791fec
>>     System image GUID: 0x0011750000791fec
>>     Port 1:
>>         State: *Active*
>>         Physical state: *LinkUp*
>>         Rate: 40
>>         Base lid: 30
>>         LMC: 0
>>         SM lid: 2
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fec
>>         Link layer: InfiniBand
>>     Port 2:
>>         State: Down
>>         Physical state: Polling
>>         Rate: 40
>>         Base lid: 65535
>>         LMC: 0
>>         SM lid: 65535
>>         Capability mask: 0x0761086a
>>         Port GUID: 0x0011750000791fed
>>         Link layer: InfiniBand
>>
>>
>> a couple of minutes then...
>>
>> # ibportstate -L 29 28
>> Switch PortInfo:
>> # Port info: Lid 29 port 28
>> LinkState:.......................Down
>> PhysLinkState:...................Polling
>> Lid:.............................75
>> SMLid:...........................2328
>> LMC:.............................0
>> LinkWidthSupported:..............1X or 4X
>> LinkWidthEnabled:................1X or 4X
>> LinkWidthActive:.................4X
>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>> LinkSpeedActive:.................2.5 Gbps
>>
>>
>> *German* <ganders at despegar.com>
>>
>> 2015-10-13 12:02 GMT-03:00 German Anders <ganders at despegar.com>:
>>
>>> Can anyone that's using the HP BLc 4X QDR IB Switch tell me in what
>>> interconnect bay is configured the switch? I've that in bay3 & mezz 1, I'll
>>> try to move to bay7 & mezz 3 and see if at least something changed or not,
>>> any useful info will be really appreciated.
>>>
>>> Cheers,
>>>
>>>
>>> *German* <ganders at despegar.com>
>>>
>>> 2015-10-13 10:51 GMT-03:00 German Anders <ganders at despegar.com>:
>>>
>>>> yes, but the switch does not come with an external interface... just
>>>> the i2c port :S
>>>>
>>>>
>>>>
>>>> *German* <ganders at despegar.com>
>>>>
>>>> 2015-10-13 10:49 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>>
>>>>> I don't really know but I was wondering about whether your hardware
>>>>> supports this.
>>>>>
>>>>> HPOA_2 shows internal interface to OA is absent as is external
>>>>> Ethernet interface.
>>>>>
>>>>> On Tue, Oct 13, 2015 at 9:31 AM, German Anders <ganders at despegar.com>
>>>>> wrote:
>>>>>
>>>>>> when trying to assign the ip addr to the interconnect bay it seems
>>>>>> that it does not 'apply' the change, and the ip addr is not displayed as
>>>>>> 'configured' so there's no way to enter from outside. ideas?
>>>>>>
>>>>>> *German*
>>>>>>
>>>>>> 2015-10-13 10:15 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>>>>
>>>>>>> What about the 3 critical errors ? What are they ?
>>>>>>>
>>>>>>> On Tue, Oct 13, 2015 at 9:13 AM, German Anders <ganders at despegar.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> I've try that and in fact I try to put the IP ADDR like this:
>>>>>>>>
>>>>>>>> SET EBIPA INTERCONNECT XX.XX.XX.XX XXX.XXX.XXX.XXX 3
>>>>>>>> SET EBIPA INTERCONNECT GATEWAY XX.XX.XX.XX 3
>>>>>>>> SET EBIPA INTERCONNECT DOMAIN "xxxxxx.net" 3
>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>> SET EBIPA INTERCONNECT NTP PRIMARY NONE 3
>>>>>>>> SET EBIPA INTERCONNECT NTP SECONDARY NONE 3
>>>>>>>> ENABLE EBIPA INTERCONNECT 3
>>>>>>>>
>>>>>>>> SAVE EBIPA
>>>>>>>>
>>>>>>>> But i'm not getting any ip response, also I've try many diff ip addr with no luck...if i put that ip to one of the blades it works fine, but not to the interconnect bay :( any other idea?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> *German* <ganders at despegar.com>
>>>>>>>>
>>>>>>>> 2015-10-13 10:01 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com
>>>>>>>> >:
>>>>>>>>
>>>>>>>>> Looks like there are 3 critical errors in system status. Did you
>>>>>>>>> look at these ?
>>>>>>>>>
>>>>>>>>> I don't know if you've seen this but there is some info on
>>>>>>>>> configuring the management IPs in
>>>>>>>>> http://h10032.www1.hp.com/ctg/Manual/c00814176.pdf
>>>>>>>>>
>>>>>>>>> Have you looked at/tried the command line interface ?
>>>>>>>>>
>>>>>>>>> On Tue, Oct 13, 2015 at 8:28 AM, German Anders <
>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Hal,
>>>>>>>>>>
>>>>>>>>>> It does not allow me to setup an IP ADDR to the Internal SW so I
>>>>>>>>>> can't access from outside, except from the tools that I mentioned before,
>>>>>>>>>> also it doesn't allow me to access through serial connection from inside
>>>>>>>>>> the enclosure. I've attach some screen-shots about the connectivity.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *German*
>>>>>>>>>>
>>>>>>>>>> 2015-10-13 9:13 GMT-03:00 Hal Rosenstock <
>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi German,
>>>>>>>>>>>
>>>>>>>>>>> Are the cards in the correct bays and slots ?
>>>>>>>>>>>
>>>>>>>>>>> Do you have the HP Onboard Administrator tool ? What does it
>>>>>>>>>>> say about internal connectivity ?
>>>>>>>>>>>
>>>>>>>>>>> -- Hal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 13, 2015 at 7:44 AM, German Anders <
>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ira,
>>>>>>>>>>>>
>>>>>>>>>>>> I've some HP documentation but it quite short, also it doesn't
>>>>>>>>>>>> describe any 'config' or 'impl' steps in order to get the internal switch
>>>>>>>>>>>> up and running. The version of SW that came with enclosure does not had any
>>>>>>>>>>>> management module at all, so it depends on external management. During
>>>>>>>>>>>> weekend I've found a way to upgrade the firmware of the HP switch with the
>>>>>>>>>>>> following command (mlxburn -d lid-0x001D -fw fw-IS4.mlx), and the I've run
>>>>>>>>>>>> (flint -d /dev/mst/SW_MT48438_0x2c902004b0918_lid-0x001D dc
>>>>>>>>>>>> /home/ceph/HPIBSW.INI) and found the following inside that file:
>>>>>>>>>>>>
>>>>>>>>>>>> [PS_INFO]
>>>>>>>>>>>> Name = 489184-B21
>>>>>>>>>>>> Description = HP BLc 4X QDR IB Switch
>>>>>>>>>>>>
>>>>>>>>>>>> [ADAPTER]
>>>>>>>>>>>> PSID = HP_0100000009
>>>>>>>>>>>>
>>>>>>>>>>>> (...)
>>>>>>>>>>>>
>>>>>>>>>>>> [IB_TO_HW_MAP]
>>>>>>>>>>>> PORT1=14
>>>>>>>>>>>> PORT2=15
>>>>>>>>>>>> PORT3=16
>>>>>>>>>>>> PORT4=17
>>>>>>>>>>>> PORT5=18
>>>>>>>>>>>> PORT6=12
>>>>>>>>>>>> PORT7=11
>>>>>>>>>>>> PORT8=10
>>>>>>>>>>>> PORT9=9
>>>>>>>>>>>> PORT10=8
>>>>>>>>>>>> PORT11=7
>>>>>>>>>>>> PORT12=6
>>>>>>>>>>>> PORT13=5
>>>>>>>>>>>> PORT14=4
>>>>>>>>>>>> PORT15=3
>>>>>>>>>>>> PORT16=2
>>>>>>>>>>>> PORT17=20
>>>>>>>>>>>> PORT18=22
>>>>>>>>>>>> PORT19=24
>>>>>>>>>>>>
>>>>>>>>>>>> PORT20=26
>>>>>>>>>>>> PORT21=28
>>>>>>>>>>>> PORT22=30
>>>>>>>>>>>> PORT23=35
>>>>>>>>>>>> PORT24=33
>>>>>>>>>>>> PORT25=21
>>>>>>>>>>>> PORT26=23
>>>>>>>>>>>> PORT27=25
>>>>>>>>>>>> PORT28=27
>>>>>>>>>>>> PORT29=29
>>>>>>>>>>>> PORT30=36
>>>>>>>>>>>> PORT31=34
>>>>>>>>>>>> PORT32=32
>>>>>>>>>>>> PORT33=1
>>>>>>>>>>>> PORT34=13
>>>>>>>>>>>> PORT35=19
>>>>>>>>>>>> PORT36=31
>>>>>>>>>>>>
>>>>>>>>>>>> [unused_ports]
>>>>>>>>>>>> hw_port1_not_in_use=1
>>>>>>>>>>>> hw_port13_not_in_use=1
>>>>>>>>>>>> hw_port19_not_in_use=1
>>>>>>>>>>>> hw_port31_not_in_use=1
>>>>>>>>>>>>
>>>>>>>>>>>> (...)
>>>>>>>>>>>>
>>>>>>>>>>>> I don't know if maybe there's some issue with the port mapping,
>>>>>>>>>>>> anyone had used this kind of switch?
>>>>>>>>>>>>
>>>>>>>>>>>> The summary of the problem is correct, the connectivity between
>>>>>>>>>>>> the IB network (MLNX switches/gw) and the HP IB switch is working since I
>>>>>>>>>>>> was able to upgrade the firmare of the switch and get information about it.
>>>>>>>>>>>> But, the connection between the mezzanine cards of the blades and the
>>>>>>>>>>>> internal IB sw enclosure is not working at all. Note, that if I go to the
>>>>>>>>>>>> OA administration of the enclosure I can see the 'green' ports mapping of
>>>>>>>>>>>> each of the blades and the interconnection switch, so I'm guessing that it
>>>>>>>>>>>> should be working.
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding the questions:
>>>>>>>>>>>>
>>>>>>>>>>>> 1)      What type of switch is in the HP chassis?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *QLogic HP BLc 4X QDR IB Switch*
>>>>>>>>>>>>
>>>>>>>>>>>> *PSID = HP_0100000009*
>>>>>>>>>>>>
>>>>>>>>>>>> *Image type:   FS2*
>>>>>>>>>>>>
>>>>>>>>>>>> *FW ver:         7.4.3000*
>>>>>>>>>>>>
>>>>>>>>>>>> *Device ID:     48438*
>>>>>>>>>>>> *GUI:              0002c902004b0918*
>>>>>>>>>>>>
>>>>>>>>>>>> 2)      Do you have console access or http access to that
>>>>>>>>>>>> switch?
>>>>>>>>>>>>
>>>>>>>>>>>> *No, since it didn't had any manage module mezzanine card
>>>>>>>>>>>> inside the switch, it only come with a i2c port. But, i can have access
>>>>>>>>>>>> through the mlxburn and flint tools from one host that's connected to the
>>>>>>>>>>>> ib network (outside the enclosure).*
>>>>>>>>>>>>
>>>>>>>>>>>> 3)      Does that switch have an SM in it?
>>>>>>>>>>>>
>>>>>>>>>>>> *No*
>>>>>>>>>>>>
>>>>>>>>>>>> 4)      What version of the kernel are you running with the
>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>
>>>>>>>>>>>> a.       I assume you are using the qib driver in that kernel.
>>>>>>>>>>>>
>>>>>>>>>>>> *Ubuntu 14.04.3 LTS - kernel 3.18.20-031820-generic*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing”  Was
>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>> switch?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *No, since LLR is only supported between mlnx devices, the ISL
>>>>>>>>>>>> are up and working, since it's possible for me to query the switch*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *When trying to run the command:*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *#
>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds/usr/sbin/truescale-serdes.cmds: 100:
>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds: Syntax error: "(" unexpected (expecting
>>>>>>>>>>>> "}")*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *# rpm -qa | grep libipathverbslibipathverbs-1.3-1.x86_64*
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *German*
>>>>>>>>>>>> 2015-10-13 2:14 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> German,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have any documentation on the HP blade system?  And the
>>>>>>>>>>>>> switch which is in that system?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have to admit I have not followed everything in this thread
>>>>>>>>>>>>> regarding your configuration but it seems like you have some mellanox
>>>>>>>>>>>>> switches connected into an HP chassis which has both a switch and blades
>>>>>>>>>>>>> with qib (Truescale) cards.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The connection from the mellanox switch to the “HP chassis
>>>>>>>>>>>>> switch” is linkup (active) but the connections to the individual qib HCAs
>>>>>>>>>>>>> are not even linkup.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is that a correct summary of the problem?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> If so here are some questions:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1)      What type of switch is in the HP chassis?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2)      Do you have console access or http access to that
>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3)      Does that switch have an SM in it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4)      What version of the kernel are you running with the
>>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>>
>>>>>>>>>>>>> a.       I assume you are using the qib driver in that kernel.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing”  Was
>>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *Weiny, Ira
>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:31 PM
>>>>>>>>>>>>> *To:* Hal Rosenstock; German Anders
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Agree with Hal here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I’m not familiar with those blades/switches.  I’ll ask around.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:* Hal Rosenstock [mailto:hal.rosenstock at gmail.com
>>>>>>>>>>>>> <hal.rosenstock at gmail.com>]
>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:26 PM
>>>>>>>>>>>>> *To:* German Anders
>>>>>>>>>>>>> *Cc:* Weiny, Ira; users at lists.openfabrics.org
>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's the gateway to the switch in the enclosure. It's the
>>>>>>>>>>>>> internal connectivity in the blade enclosure that's (physically) broken.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 4:24 PM, German Anders <
>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> cabled
>>>>>>>>>>>>>
>>>>>>>>>>>>> the blade it's:
>>>>>>>>>>>>>
>>>>>>>>>>>>> vendid=0x2c9
>>>>>>>>>>>>> devid=0xbd36
>>>>>>>>>>>>> sysimgguid=0x2c902004b0918
>>>>>>>>>>>>> switchguid=0x2c902004b0918(2c902004b0918)
>>>>>>>>>>>>> Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV
>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>> [1]    "S-e41d2d030031e9c1"[9]        #
>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" lid 24 4xQDR
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-10-07 17:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> What are those HCAs cabled to or is it internal to the blade
>>>>>>>>>>>>> enclosure ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:24 PM, German Anders <
>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yeah, is there any command that I can run in order to change
>>>>>>>>>>>>> the port state on the remote switch? I mean everything looks good but in
>>>>>>>>>>>>> the hp blades still getting:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>     Number of ports: 2
>>>>>>>>>>>>>     Firmware version:
>>>>>>>>>>>>>     Hardware version: 2
>>>>>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>     Port 1:
>>>>>>>>>>>>>         State: *Down*
>>>>>>>>>>>>>         Physical state: *Polling*
>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>     Port 2:
>>>>>>>>>>>>>         State: *Down*
>>>>>>>>>>>>>         Physical state: *Polling*
>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also on working hosts I only see devices from the local
>>>>>>>>>>>>> network, but didn't see any of the blades hca connections.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The screen shot looks good :-) SM brought the link up to
>>>>>>>>>>>>> active.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note that the ibportstate command you gave was for switch port
>>>>>>>>>>>>> 0 of the Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <
>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, find attached an screenshot of the port information (# 9)
>>>>>>>>>>>>> the one that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also from
>>>>>>>>>>>>> one of the hosts that are connected to one of the SX6018F I can see the
>>>>>>>>>>>>> 'remote' HP IB SW:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # *ibnodes*
>>>>>>>>>>>>>
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV
>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>> Switch    : 0xe41d2d030031e9c1 ports 37
>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" enhanced port 0 lid 24 lmc 0
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> # *ibportstate -L 29 query*
>>>>>>>>>>>>> Switch PortInfo:
>>>>>>>>>>>>> # Port info: Lid 29 port 0
>>>>>>>>>>>>> LinkState:.......................Active
>>>>>>>>>>>>> PhysLinkState:...................LinkUp
>>>>>>>>>>>>> Lid:.............................29
>>>>>>>>>>>>> SMLid:...........................2
>>>>>>>>>>>>> LMC:.............................0
>>>>>>>>>>>>> LinkWidthSupported:..............1X or 4X
>>>>>>>>>>>>> LinkWidthEnabled:................1X or 4X
>>>>>>>>>>>>> LinkWidthActive:.................4X
>>>>>>>>>>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0
>>>>>>>>>>>>> Gbps
>>>>>>>>>>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0
>>>>>>>>>>>>> Gbps
>>>>>>>>>>>>> LinkSpeedActive:.................10.0 Gbps
>>>>>>>>>>>>> Mkey:............................<not displayed>
>>>>>>>>>>>>> MkeyLeasePeriod:.................0
>>>>>>>>>>>>> ProtectBits:.....................0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> One more thing hopefully before playing with the low level phy
>>>>>>>>>>>>> settings:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you using known good cables ? Do you have FDR cables on
>>>>>>>>>>>>> the FDR <-> FDR links ? Cable lengths can matter as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Were the ports mapped to the phy profile shutdown when you
>>>>>>>>>>>>> changed this ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> LLR is a proprietary Mellanox mechanism.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You might want 2 different profiles: one for the interfaces
>>>>>>>>>>>>> connected to other gateway interfaces (which are FDR (and FDR-10) capable
>>>>>>>>>>>>> and the other for the interfaces connecting to QDR (the older equipment in
>>>>>>>>>>>>> your network). By configuring the Switch-X interfaces to the appropriate
>>>>>>>>>>>>> possible speeds and disabling the proprietary mechanisms there, the link
>>>>>>>>>>>>> should not only come up but also this will occur faster than if FDR/FDR10
>>>>>>>>>>>>> are enabled.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I suspect that due to the Switch-X configuration that the
>>>>>>>>>>>>> links to the switch(es) in the HP enclosures do not negotiate properly (as
>>>>>>>>>>>>> shown by down rather than LinkUp).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Once you get all your links to INIT, negotiation has occurred
>>>>>>>>>>>>> and then it's time for SM to bring links to active.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since you have down links, the SM can't do anything about
>>>>>>>>>>>>> those.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <
>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I
>>>>>>>>>>>>> know that this kind of SW does not come with an embedded sm, but I don't
>>>>>>>>>>>>> know how to access any mgmt at all on this particularly switch, I mean for
>>>>>>>>>>>>> example to setup speed or anything like that, is possible to access through
>>>>>>>>>>>>> the chassis?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-10-07 13:19 GMT-03:00 German Anders <ganders at despegar.com
>>>>>>>>>>>>> >:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think so, but when trying to configured the phy-profile on
>>>>>>>>>>>>> the interface in order to negotiate on QDR it failed to map the profile:
>>>>>>>>>>>>>
>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>> high-speed-ber
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Profile: high-speed-ber
>>>>>>>>>>>>>   --------
>>>>>>>>>>>>>   llr support ib-speed
>>>>>>>>>>>>>   SDR: disable
>>>>>>>>>>>>>   DDR: disable
>>>>>>>>>>>>>   QDR: disable
>>>>>>>>>>>>>   FDR10: enable-request
>>>>>>>>>>>>>   FDR: enable-request
>>>>>>>>>>>>>
>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>> hp-encl-isl
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Profile: hp-encl-isl
>>>>>>>>>>>>>   --------
>>>>>>>>>>>>>   llr support ib-speed
>>>>>>>>>>>>>   SDR: disable
>>>>>>>>>>>>>   DDR: disable
>>>>>>>>>>>>>   QDR: enable
>>>>>>>>>>>>>   FDR10: enable-request
>>>>>>>>>>>>>   FDR: enable-request
>>>>>>>>>>>>>
>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) #
>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9
>>>>>>>>>>>>> phy-profile map hp-encl-isl
>>>>>>>>>>>>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The driver ‘qib’ is loading fine.  As can be seen by the
>>>>>>>>>>>>> ibstat output.  The ib_ipath is an older card.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The problem is the link is not coming up to init.  Like Hal
>>>>>>>>>>>>> said the link should transition to “link up” without the SMs involvement.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you are on to something with the fact that it seems
>>>>>>>>>>>>> like your switch ports are not configured to do QDR.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:* German Anders [mailto:ganders at despegar.com]
>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>>>>>>>>>>>> *To:* Weiny, Ira
>>>>>>>>>>>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes I've that file:
>>>>>>>>>>>>>
>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also I've done the install of libipathverbs:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # apt-get install libipathverbs-dev
>>>>>>>>>>>>>
>>>>>>>>>>>>> But I try to load the ib_ipath module but I'm getting the
>>>>>>>>>>>>> following error msg:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # modprobe ib_ipath
>>>>>>>>>>>>> modprobe: ERROR: could not insert 'ib_ipath': Device or
>>>>>>>>>>>>> resource busy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are a few issues for routing in that diagram but the
>>>>>>>>>>>>> links should come up.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I assume there is some backplane between the blade servers and
>>>>>>>>>>>>> the switch in that chassis?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Have you gotten libipathverbs installed?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In ipathverbs there is a serdes tuning script.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does your libipathverbs include that file?  If not try the
>>>>>>>>>>>>> latest from github.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German
>>>>>>>>>>>>> Anders
>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>>>>>>>>>>>> *To:* Hal Rosenstock
>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Hal,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the reply, I've attach a pdf with the diagram
>>>>>>>>>>>>> topology, I don't know if this is the best way to go or if there's another
>>>>>>>>>>>>> way to connect and setup the IB network, tips and suggestions will be very
>>>>>>>>>>>>> appreciated, also the mezzanine cards are already installed on the blade
>>>>>>>>>>>>> hosts:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # lspci
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA
>>>>>>>>>>>>> (rev 02)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi again German,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like you made some progress from yesterday as the qib
>>>>>>>>>>>>> ports are now Polling rather than Disabled.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> But since they are Down, do you have them cabled to a switch ?
>>>>>>>>>>>>> That should bring the links up and the port state will be Init. That is the
>>>>>>>>>>>>> "starting" point.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You will also then need to be running SM to bring the ports up
>>>>>>>>>>>>> to Active.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <
>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't know if this is the mailist list for this kind of
>>>>>>>>>>>>> topic but I'm really new to IB and I've just install two SX6036G gateways
>>>>>>>>>>>>> connected to each other through two ISL ports, then I've configured a
>>>>>>>>>>>>> proxy-arp between both nodes (sm is disable on both gw's):
>>>>>>>>>>>>>
>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>>>>>>>>>>>
>>>>>>>>>>>>> Load balancing algorithm: ib-base-ip
>>>>>>>>>>>>> Number of Proxy-Arp interfaces: 1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Proxy-ARP VIP
>>>>>>>>>>>>> =============
>>>>>>>>>>>>> Pra-group name: proxy-ha-group
>>>>>>>>>>>>> HA VIP address: 10.xx.xx.xx/xx
>>>>>>>>>>>>>
>>>>>>>>>>>>> Active nodes:
>>>>>>>>>>>>> ID                   State                IP
>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>> GWIB01               master               10.xx.xx.xx1
>>>>>>>>>>>>> GWIB02               standby              10.xx.xx.xx2
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*),
>>>>>>>>>>>>> one connected to GWIB01 and the other connected to GWIB02. The SM is
>>>>>>>>>>>>> configured locally on both SWIB01 & SWIB02 switches. So far so good, after
>>>>>>>>>>>>> this config I setup a commodity server with a MLNX IB ADPT FDR to the
>>>>>>>>>>>>> SWIB01 & SWIB02 switches, config the drivers, etc and then get it up &
>>>>>>>>>>>>> running fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Finally I've setup a HP Enclosure with an internal IB SW (then
>>>>>>>>>>>>> connect port 1 of the internal SW to GWIB01 - link is up but LLR status is
>>>>>>>>>>>>> inactive), install one of the blades and I see the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>     Number of ports: 2
>>>>>>>>>>>>>     Firmware version:
>>>>>>>>>>>>>     Hardware version: 2
>>>>>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>     Port 1:
>>>>>>>>>>>>>         State: Down
>>>>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>     Port 2:
>>>>>>>>>>>>>         State: Down
>>>>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I was wondering if maybe the SM is not being recognized on
>>>>>>>>>>>>> the Blade system and that's why is not passing the Polling state, is that
>>>>>>>>>>>>> possible? Or maybe is not possible to connect an ISL between the GW and the
>>>>>>>>>>>>> HP internal SW so that the sm is available or maybe the inactive LLR is
>>>>>>>>>>>>> causing this thing, any ideas? I thought about connecting the
>>>>>>>>>>>>> ISL of the HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I don't
>>>>>>>>>>>>> have any available ports.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>>>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/5bd72d92/attachment.html>


More information about the Users mailing list