[Users] IB topology config and polling state

Hal Rosenstock hal.rosenstock at gmail.com
Tue Oct 13 11:01:42 PDT 2015


SymbolErrors, PortRcvErrors, LinkErrorRecovery are all indicative of a bad
physical connection. Usually this is cables but this is a backplane
connection. Would you reseat the boards and make sure they're solidly in
their backplane connectors ?

On Tue, Oct 13, 2015 at 1:58 PM, German Anders <ganders at despegar.com> wrote:

> I see many errors increasing:
>
> # perfquery 29 28
> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
> PortSelect:......................28
> CounterSelect:...................0x0000
> SymbolErrorCounter:..............65535
> LinkErrorRecoveryCounter:........0
> LinkDownedCounter:...............0
> PortRcvErrors:...................20904
> PortRcvRemotePhysicalErrors:.....0
> PortRcvSwitchRelayErrors:........0
> PortXmitDiscards:................0
> PortXmitConstraintErrors:........0
> PortRcvConstraintErrors:.........0
> CounterSelect2:..................0x00
> LocalLinkIntegrityErrors:........0
> ExcessiveBufferOverrunErrors:....0
> VL15Dropped:.....................0
> PortXmitData:....................166
> PortRcvData:.....................167
> PortXmitPkts:....................3
> PortRcvPkts:.....................3
> PortXmitWait:....................0
>
> # perfquery 29 28
> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
> PortSelect:......................28
> CounterSelect:...................0x0000
> SymbolErrorCounter:..............65535
> LinkErrorRecoveryCounter:........4
> LinkDownedCounter:...............0
> PortRcvErrors:...................37474
> PortRcvRemotePhysicalErrors:.....0
> PortRcvSwitchRelayErrors:........0
> PortXmitDiscards:................0
> PortXmitConstraintErrors:........0
> PortRcvConstraintErrors:.........0
> CounterSelect2:..................0x00
> LocalLinkIntegrityErrors:........0
> ExcessiveBufferOverrunErrors:....0
> VL15Dropped:.....................0
> PortXmitData:....................260
> PortRcvData:.....................262
> PortXmitPkts:....................5
> PortRcvPkts:.....................5
> PortXmitWait:....................0
>
> # perfquery 29 28
> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
> PortSelect:......................28
> CounterSelect:...................0x0000
> SymbolErrorCounter:..............65535
> LinkErrorRecoveryCounter:........4
> LinkDownedCounter:...............0
> PortRcvErrors:...................60240
> PortRcvRemotePhysicalErrors:.....0
> PortRcvSwitchRelayErrors:........0
> PortXmitDiscards:................0
> PortXmitConstraintErrors:........0
> PortRcvConstraintErrors:.........0
> CounterSelect2:..................0x00
> LocalLinkIntegrityErrors:........0
> ExcessiveBufferOverrunErrors:....0
> VL15Dropped:.....................0
> PortXmitData:....................449
> PortRcvData:.....................452
> PortXmitPkts:....................9
> PortRcvPkts:.....................9
> PortXmitWait:....................0
>
> *after changing the speed port to 7:*
>
> # perfquery 29 28
> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
> PortSelect:......................28
> CounterSelect:...................0x0000
> SymbolErrorCounter:..............65535
> LinkErrorRecoveryCounter:........2
> LinkDownedCounter:...............0
> PortRcvErrors:...................1568
> PortRcvRemotePhysicalErrors:.....0
> PortRcvSwitchRelayErrors:........0
> PortXmitDiscards:................0
> PortXmitConstraintErrors:........0
> PortRcvConstraintErrors:.........0
> CounterSelect2:..................0x00
> LocalLinkIntegrityErrors:........0
> ExcessiveBufferOverrunErrors:....0
> VL15Dropped:.....................0
> PortXmitData:....................0
> PortRcvData:.....................0
> PortXmitPkts:....................0
> PortRcvPkts:.....................0
> PortXmitWait:....................0
>
> # perfquery 29 28
> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
> PortSelect:......................28
> CounterSelect:...................0x0000
> SymbolErrorCounter:..............65535
> LinkErrorRecoveryCounter:........2
> LinkDownedCounter:...............0
> PortRcvErrors:...................11356
> PortRcvRemotePhysicalErrors:.....0
> PortRcvSwitchRelayErrors:........0
> PortXmitDiscards:................0
> PortXmitConstraintErrors:........0
> PortRcvConstraintErrors:.........0
> CounterSelect2:..................0x00
> LocalLinkIntegrityErrors:........0
> ExcessiveBufferOverrunErrors:....0
> VL15Dropped:.....................0
> PortXmitData:....................166
> PortRcvData:.....................167
> PortXmitPkts:....................3
> PortRcvPkts:.....................3
> PortXmitWait:....................0
>
> # perfquery 29 28
> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
> PortSelect:......................28
> CounterSelect:...................0x0000
> SymbolErrorCounter:..............65535
> LinkErrorRecoveryCounter:........3
> LinkDownedCounter:...............0
> PortRcvErrors:...................44675
> PortRcvRemotePhysicalErrors:.....0
> PortRcvSwitchRelayErrors:........0
> PortXmitDiscards:................0
> PortXmitConstraintErrors:........0
> PortRcvConstraintErrors:.........0
> CounterSelect2:..................0x00
> LocalLinkIntegrityErrors:........0
> ExcessiveBufferOverrunErrors:....0
> VL15Dropped:.....................0
> PortXmitData:....................261
> PortRcvData:.....................284
> PortXmitPkts:....................5
> PortRcvPkts:.....................6
> PortXmitWait:....................0
>
>
>
> *German* <ganders at despegar.com>
>
> 2015-10-13 14:26 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>
>> Speed 4 is not valid as an enabled speed. It probably is speed 7 (SDR,
>> DDR, or QDR) . Since qib is QDR (only), espeed shouldn't be used. QDR only
>> is non standard which is why it says "IBA extension".
>>
>> As to why the link comes up and then goes down, there are a number of
>> possible reasons. What does perfquery 29 28 say ? Are there some error
>> counters increasing there ? If so, which ones ?
>>
>> On Tue, Oct 13, 2015 at 1:09 PM, German Anders <ganders at despegar.com>
>> wrote:
>>
>>> Ok, some good news! I've find the way to get LinkUp on the blade port...
>>> but.. not for so long :S, it seems that after making the following changes
>>> the ports at the end come up...but then in a couple of minutes goes down
>>> again:
>>>
>>> - Move the Encl IB SW to Bay7
>>> - Move the Blade Mezz cards to slot 3
>>>
>>> Then for ex:
>>>
>>> Blade IB SW - LID: 29
>>> Blade    4    IB-SW-PORT    28
>>>
>>> 1)
>>>
>>> # ibportstate -L 29 28
>>> Switch PortInfo:
>>> # Port info: Lid 29 port 28
>>> LinkState:.......................Down
>>> PhysLinkState:...................Polling
>>> Lid:.............................75
>>> SMLid:...........................2328
>>> LMC:.............................0
>>> LinkWidthSupported:..............1X or 4X
>>> LinkWidthEnabled:................1X or 4X
>>> LinkWidthActive:.................4X
>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedActive:.................2.5 Gbps
>>>
>>>
>>> 2)
>>>
>>> # ibportstate -L 29 28 disable
>>> # ibportstate -L 29 28 speed 4
>>> # ibportstate -L 29 28 espeed 4
>>> # ibportstate -L 29 28 smlid 2
>>> # ibportstate -L 29 28 enable
>>>
>>> 3)
>>>
>>> # ibportstate -L 29 28
>>> Switch PortInfo:
>>> # Port info: Lid 29 port 28
>>> LinkState:.......................Active
>>> PhysLinkState:...................LinkUp
>>> Lid:.............................75
>>> SMLid:...........................2328
>>> LMC:.............................0
>>> LinkWidthSupported:..............1X or 4X
>>> LinkWidthEnabled:................1X or 4X
>>> LinkWidthActive:.................4X
>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedActive:.................10.0 Gbps
>>> Peer PortInfo:
>>> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,28 port 1
>>> LinkState:.......................Active
>>> PhysLinkState:...................LinkUp
>>> Lid:.............................30
>>> SMLid:...........................2
>>> LMC:.............................0
>>> LinkWidthSupported:..............1X or 4X
>>> LinkWidthEnabled:................1X or 4X
>>> LinkWidthActive:.................4X
>>> LinkSpeedSupported:..............10.0 Gbps (IBA extension)
>>> LinkSpeedEnabled:................10.0 Gbps (IBA extension)
>>> LinkSpeedActive:.................10.0 Gbps
>>> Mkey:............................<not displayed>
>>> MkeyLeasePeriod:.................0
>>> ProtectBits:.....................0
>>>
>>>
>>> # ibstat
>>> CA 'qib0'
>>>     CA type: InfiniPath_QMH7342
>>>     Number of ports: 2
>>>     Firmware version:
>>>     Hardware version: 2
>>>     Node GUID: 0x0011750000791fec
>>>     System image GUID: 0x0011750000791fec
>>>     Port 1:
>>>         State: *Active*
>>>         Physical state: *LinkUp*
>>>         Rate: 40
>>>         Base lid: 30
>>>         LMC: 0
>>>         SM lid: 2
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fec
>>>         Link layer: InfiniBand
>>>     Port 2:
>>>         State: Down
>>>         Physical state: Polling
>>>         Rate: 40
>>>         Base lid: 65535
>>>         LMC: 0
>>>         SM lid: 65535
>>>         Capability mask: 0x0761086a
>>>         Port GUID: 0x0011750000791fed
>>>         Link layer: InfiniBand
>>>
>>>
>>> a couple of minutes then...
>>>
>>> # ibportstate -L 29 28
>>> Switch PortInfo:
>>> # Port info: Lid 29 port 28
>>> LinkState:.......................Down
>>> PhysLinkState:...................Polling
>>> Lid:.............................75
>>> SMLid:...........................2328
>>> LMC:.............................0
>>> LinkWidthSupported:..............1X or 4X
>>> LinkWidthEnabled:................1X or 4X
>>> LinkWidthActive:.................4X
>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>> LinkSpeedActive:.................2.5 Gbps
>>>
>>>
>>> *German* <ganders at despegar.com>
>>>
>>> 2015-10-13 12:02 GMT-03:00 German Anders <ganders at despegar.com>:
>>>
>>>> Can anyone that's using the HP BLc 4X QDR IB Switch tell me in what
>>>> interconnect bay is configured the switch? I've that in bay3 & mezz 1, I'll
>>>> try to move to bay7 & mezz 3 and see if at least something changed or not,
>>>> any useful info will be really appreciated.
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> *German* <ganders at despegar.com>
>>>>
>>>> 2015-10-13 10:51 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>
>>>>> yes, but the switch does not come with an external interface... just
>>>>> the i2c port :S
>>>>>
>>>>>
>>>>>
>>>>> *German* <ganders at despegar.com>
>>>>>
>>>>> 2015-10-13 10:49 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>>>
>>>>>> I don't really know but I was wondering about whether your hardware
>>>>>> supports this.
>>>>>>
>>>>>> HPOA_2 shows internal interface to OA is absent as is external
>>>>>> Ethernet interface.
>>>>>>
>>>>>> On Tue, Oct 13, 2015 at 9:31 AM, German Anders <ganders at despegar.com>
>>>>>> wrote:
>>>>>>
>>>>>>> when trying to assign the ip addr to the interconnect bay it seems
>>>>>>> that it does not 'apply' the change, and the ip addr is not displayed as
>>>>>>> 'configured' so there's no way to enter from outside. ideas?
>>>>>>>
>>>>>>> *German*
>>>>>>>
>>>>>>> 2015-10-13 10:15 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>
>>>>>>> :
>>>>>>>
>>>>>>>> What about the 3 critical errors ? What are they ?
>>>>>>>>
>>>>>>>> On Tue, Oct 13, 2015 at 9:13 AM, German Anders <
>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>
>>>>>>>>> I've try that and in fact I try to put the IP ADDR like this:
>>>>>>>>>
>>>>>>>>> SET EBIPA INTERCONNECT XX.XX.XX.XX XXX.XXX.XXX.XXX 3
>>>>>>>>> SET EBIPA INTERCONNECT GATEWAY XX.XX.XX.XX 3
>>>>>>>>> SET EBIPA INTERCONNECT DOMAIN "xxxxxx.net" 3
>>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>>> SET EBIPA INTERCONNECT NTP PRIMARY NONE 3
>>>>>>>>> SET EBIPA INTERCONNECT NTP SECONDARY NONE 3
>>>>>>>>> ENABLE EBIPA INTERCONNECT 3
>>>>>>>>>
>>>>>>>>> SAVE EBIPA
>>>>>>>>>
>>>>>>>>> But i'm not getting any ip response, also I've try many diff ip addr with no luck...if i put that ip to one of the blades it works fine, but not to the interconnect bay :( any other idea?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *German* <ganders at despegar.com>
>>>>>>>>>
>>>>>>>>> 2015-10-13 10:01 GMT-03:00 Hal Rosenstock <
>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>
>>>>>>>>>> Looks like there are 3 critical errors in system status. Did you
>>>>>>>>>> look at these ?
>>>>>>>>>>
>>>>>>>>>> I don't know if you've seen this but there is some info on
>>>>>>>>>> configuring the management IPs in
>>>>>>>>>> http://h10032.www1.hp.com/ctg/Manual/c00814176.pdf
>>>>>>>>>>
>>>>>>>>>> Have you looked at/tried the command line interface ?
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 13, 2015 at 8:28 AM, German Anders <
>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Hal,
>>>>>>>>>>>
>>>>>>>>>>> It does not allow me to setup an IP ADDR to the Internal SW so I
>>>>>>>>>>> can't access from outside, except from the tools that I mentioned before,
>>>>>>>>>>> also it doesn't allow me to access through serial connection from inside
>>>>>>>>>>> the enclosure. I've attach some screen-shots about the connectivity.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *German*
>>>>>>>>>>>
>>>>>>>>>>> 2015-10-13 9:13 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi German,
>>>>>>>>>>>>
>>>>>>>>>>>> Are the cards in the correct bays and slots ?
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have the HP Onboard Administrator tool ? What does it
>>>>>>>>>>>> say about internal connectivity ?
>>>>>>>>>>>>
>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 13, 2015 at 7:44 AM, German Anders <
>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ira,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've some HP documentation but it quite short, also it doesn't
>>>>>>>>>>>>> describe any 'config' or 'impl' steps in order to get the internal switch
>>>>>>>>>>>>> up and running. The version of SW that came with enclosure does not had any
>>>>>>>>>>>>> management module at all, so it depends on external management. During
>>>>>>>>>>>>> weekend I've found a way to upgrade the firmware of the HP switch with the
>>>>>>>>>>>>> following command (mlxburn -d lid-0x001D -fw fw-IS4.mlx), and the I've run
>>>>>>>>>>>>> (flint -d /dev/mst/SW_MT48438_0x2c902004b0918_lid-0x001D dc
>>>>>>>>>>>>> /home/ceph/HPIBSW.INI) and found the following inside that file:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [PS_INFO]
>>>>>>>>>>>>> Name = 489184-B21
>>>>>>>>>>>>> Description = HP BLc 4X QDR IB Switch
>>>>>>>>>>>>>
>>>>>>>>>>>>> [ADAPTER]
>>>>>>>>>>>>> PSID = HP_0100000009
>>>>>>>>>>>>>
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> [IB_TO_HW_MAP]
>>>>>>>>>>>>> PORT1=14
>>>>>>>>>>>>> PORT2=15
>>>>>>>>>>>>> PORT3=16
>>>>>>>>>>>>> PORT4=17
>>>>>>>>>>>>> PORT5=18
>>>>>>>>>>>>> PORT6=12
>>>>>>>>>>>>> PORT7=11
>>>>>>>>>>>>> PORT8=10
>>>>>>>>>>>>> PORT9=9
>>>>>>>>>>>>> PORT10=8
>>>>>>>>>>>>> PORT11=7
>>>>>>>>>>>>> PORT12=6
>>>>>>>>>>>>> PORT13=5
>>>>>>>>>>>>> PORT14=4
>>>>>>>>>>>>> PORT15=3
>>>>>>>>>>>>> PORT16=2
>>>>>>>>>>>>> PORT17=20
>>>>>>>>>>>>> PORT18=22
>>>>>>>>>>>>> PORT19=24
>>>>>>>>>>>>>
>>>>>>>>>>>>> PORT20=26
>>>>>>>>>>>>> PORT21=28
>>>>>>>>>>>>> PORT22=30
>>>>>>>>>>>>> PORT23=35
>>>>>>>>>>>>> PORT24=33
>>>>>>>>>>>>> PORT25=21
>>>>>>>>>>>>> PORT26=23
>>>>>>>>>>>>> PORT27=25
>>>>>>>>>>>>> PORT28=27
>>>>>>>>>>>>> PORT29=29
>>>>>>>>>>>>> PORT30=36
>>>>>>>>>>>>> PORT31=34
>>>>>>>>>>>>> PORT32=32
>>>>>>>>>>>>> PORT33=1
>>>>>>>>>>>>> PORT34=13
>>>>>>>>>>>>> PORT35=19
>>>>>>>>>>>>> PORT36=31
>>>>>>>>>>>>>
>>>>>>>>>>>>> [unused_ports]
>>>>>>>>>>>>> hw_port1_not_in_use=1
>>>>>>>>>>>>> hw_port13_not_in_use=1
>>>>>>>>>>>>> hw_port19_not_in_use=1
>>>>>>>>>>>>> hw_port31_not_in_use=1
>>>>>>>>>>>>>
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't know if maybe there's some issue with the port
>>>>>>>>>>>>> mapping, anyone had used this kind of switch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The summary of the problem is correct, the connectivity
>>>>>>>>>>>>> between the IB network (MLNX switches/gw) and the HP IB switch is working
>>>>>>>>>>>>> since I was able to upgrade the firmare of the switch and get information
>>>>>>>>>>>>> about it. But, the connection between the mezzanine cards of the blades and
>>>>>>>>>>>>> the internal IB sw enclosure is not working at all. Note, that if I go to
>>>>>>>>>>>>> the OA administration of the enclosure I can see the 'green' ports mapping
>>>>>>>>>>>>> of each of the blades and the interconnection switch, so I'm guessing that
>>>>>>>>>>>>> it should be working.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding the questions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1)      What type of switch is in the HP chassis?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *QLogic HP BLc 4X QDR IB Switch*
>>>>>>>>>>>>>
>>>>>>>>>>>>> *PSID = HP_0100000009*
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Image type:   FS2*
>>>>>>>>>>>>>
>>>>>>>>>>>>> *FW ver:         7.4.3000*
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Device ID:     48438*
>>>>>>>>>>>>> *GUI:              0002c902004b0918*
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2)      Do you have console access or http access to that
>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> *No, since it didn't had any manage module mezzanine card
>>>>>>>>>>>>> inside the switch, it only come with a i2c port. But, i can have access
>>>>>>>>>>>>> through the mlxburn and flint tools from one host that's connected to the
>>>>>>>>>>>>> ib network (outside the enclosure).*
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3)      Does that switch have an SM in it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> *No*
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4)      What version of the kernel are you running with the
>>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>>
>>>>>>>>>>>>> a.       I assume you are using the qib driver in that kernel.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Ubuntu 14.04.3 LTS - kernel 3.18.20-031820-generic*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing”  Was
>>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *No, since LLR is only supported between mlnx devices, the ISL
>>>>>>>>>>>>> are up and working, since it's possible for me to query the switch*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *When trying to run the command:*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *#
>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds/usr/sbin/truescale-serdes.cmds: 100:
>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds: Syntax error: "(" unexpected (expecting
>>>>>>>>>>>>> "}")*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *# rpm -qa | grep libipathverbslibipathverbs-1.3-1.x86_64*
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *German*
>>>>>>>>>>>>> 2015-10-13 2:14 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> German,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you have any documentation on the HP blade system?  And
>>>>>>>>>>>>>> the switch which is in that system?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have to admit I have not followed everything in this thread
>>>>>>>>>>>>>> regarding your configuration but it seems like you have some mellanox
>>>>>>>>>>>>>> switches connected into an HP chassis which has both a switch and blades
>>>>>>>>>>>>>> with qib (Truescale) cards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The connection from the mellanox switch to the “HP chassis
>>>>>>>>>>>>>> switch” is linkup (active) but the connections to the individual qib HCAs
>>>>>>>>>>>>>> are not even linkup.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is that a correct summary of the problem?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If so here are some questions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1)      What type of switch is in the HP chassis?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2)      Do you have console access or http access to that
>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3)      Does that switch have an SM in it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4)      What version of the kernel are you running with the
>>>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a.       I assume you are using the qib driver in that
>>>>>>>>>>>>>> kernel.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing”  Was
>>>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *Weiny,
>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:31 PM
>>>>>>>>>>>>>> *To:* Hal Rosenstock; German Anders
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Agree with Hal here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’m not familiar with those blades/switches.  I’ll ask around.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *From:* Hal Rosenstock [mailto:hal.rosenstock at gmail.com
>>>>>>>>>>>>>> <hal.rosenstock at gmail.com>]
>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:26 PM
>>>>>>>>>>>>>> *To:* German Anders
>>>>>>>>>>>>>> *Cc:* Weiny, Ira; users at lists.openfabrics.org
>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's the gateway to the switch in the enclosure. It's the
>>>>>>>>>>>>>> internal connectivity in the blade enclosure that's (physically) broken.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 4:24 PM, German Anders <
>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cabled
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the blade it's:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> vendid=0x2c9
>>>>>>>>>>>>>> devid=0xbd36
>>>>>>>>>>>>>> sysimgguid=0x2c902004b0918
>>>>>>>>>>>>>> switchguid=0x2c902004b0918(2c902004b0918)
>>>>>>>>>>>>>> Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV
>>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>>> [1]    "S-e41d2d030031e9c1"[9]        #
>>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" lid 24 4xQDR
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-10-07 17:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What are those HCAs cabled to or is it internal to the blade
>>>>>>>>>>>>>> enclosure ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:24 PM, German Anders <
>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yeah, is there any command that I can run in order to change
>>>>>>>>>>>>>> the port state on the remote switch? I mean everything looks good but in
>>>>>>>>>>>>>> the hp blades still getting:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>>     Number of ports: 2
>>>>>>>>>>>>>>     Firmware version:
>>>>>>>>>>>>>>     Hardware version: 2
>>>>>>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>>     Port 1:
>>>>>>>>>>>>>>         State: *Down*
>>>>>>>>>>>>>>         Physical state: *Polling*
>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>     Port 2:
>>>>>>>>>>>>>>         State: *Down*
>>>>>>>>>>>>>>         Physical state: *Polling*
>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also on working hosts I only see devices from the local
>>>>>>>>>>>>>> network, but didn't see any of the blades hca connections.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The screen shot looks good :-) SM brought the link up to
>>>>>>>>>>>>>> active.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Note that the ibportstate command you gave was for switch
>>>>>>>>>>>>>> port 0 of the Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <
>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, find attached an screenshot of the port information (#
>>>>>>>>>>>>>> 9) the one that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also
>>>>>>>>>>>>>> from one of the hosts that are connected to one of the SX6018F I can see
>>>>>>>>>>>>>> the 'remote' HP IB SW:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # *ibnodes*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV
>>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>>> Switch    : 0xe41d2d030031e9c1 ports 37
>>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" enhanced port 0 lid 24 lmc 0
>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # *ibportstate -L 29 query*
>>>>>>>>>>>>>> Switch PortInfo:
>>>>>>>>>>>>>> # Port info: Lid 29 port 0
>>>>>>>>>>>>>> LinkState:.......................Active
>>>>>>>>>>>>>> PhysLinkState:...................LinkUp
>>>>>>>>>>>>>> Lid:.............................29
>>>>>>>>>>>>>> SMLid:...........................2
>>>>>>>>>>>>>> LMC:.............................0
>>>>>>>>>>>>>> LinkWidthSupported:..............1X or 4X
>>>>>>>>>>>>>> LinkWidthEnabled:................1X or 4X
>>>>>>>>>>>>>> LinkWidthActive:.................4X
>>>>>>>>>>>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0
>>>>>>>>>>>>>> Gbps
>>>>>>>>>>>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0
>>>>>>>>>>>>>> Gbps
>>>>>>>>>>>>>> LinkSpeedActive:.................10.0 Gbps
>>>>>>>>>>>>>> Mkey:............................<not displayed>
>>>>>>>>>>>>>> MkeyLeasePeriod:.................0
>>>>>>>>>>>>>> ProtectBits:.....................0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One more thing hopefully before playing with the low level
>>>>>>>>>>>>>> phy settings:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are you using known good cables ? Do you have FDR cables on
>>>>>>>>>>>>>> the FDR <-> FDR links ? Cable lengths can matter as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Were the ports mapped to the phy profile shutdown when you
>>>>>>>>>>>>>> changed this ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> LLR is a proprietary Mellanox mechanism.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You might want 2 different profiles: one for the interfaces
>>>>>>>>>>>>>> connected to other gateway interfaces (which are FDR (and FDR-10) capable
>>>>>>>>>>>>>> and the other for the interfaces connecting to QDR (the older equipment in
>>>>>>>>>>>>>> your network). By configuring the Switch-X interfaces to the appropriate
>>>>>>>>>>>>>> possible speeds and disabling the proprietary mechanisms there, the link
>>>>>>>>>>>>>> should not only come up but also this will occur faster than if FDR/FDR10
>>>>>>>>>>>>>> are enabled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I suspect that due to the Switch-X configuration that the
>>>>>>>>>>>>>> links to the switch(es) in the HP enclosures do not negotiate properly (as
>>>>>>>>>>>>>> shown by down rather than LinkUp).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Once you get all your links to INIT, negotiation has occurred
>>>>>>>>>>>>>> and then it's time for SM to bring links to active.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since you have down links, the SM can't do anything about
>>>>>>>>>>>>>> those.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <
>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I
>>>>>>>>>>>>>> know that this kind of SW does not come with an embedded sm, but I don't
>>>>>>>>>>>>>> know how to access any mgmt at all on this particularly switch, I mean for
>>>>>>>>>>>>>> example to setup speed or anything like that, is possible to access through
>>>>>>>>>>>>>> the chassis?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-10-07 13:19 GMT-03:00 German Anders <
>>>>>>>>>>>>>> ganders at despegar.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think so, but when trying to configured the phy-profile on
>>>>>>>>>>>>>> the interface in order to negotiate on QDR it failed to map the profile:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>>> high-speed-ber
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   Profile: high-speed-ber
>>>>>>>>>>>>>>   --------
>>>>>>>>>>>>>>   llr support ib-speed
>>>>>>>>>>>>>>   SDR: disable
>>>>>>>>>>>>>>   DDR: disable
>>>>>>>>>>>>>>   QDR: disable
>>>>>>>>>>>>>>   FDR10: enable-request
>>>>>>>>>>>>>>   FDR: enable-request
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>>> hp-encl-isl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   Profile: hp-encl-isl
>>>>>>>>>>>>>>   --------
>>>>>>>>>>>>>>   llr support ib-speed
>>>>>>>>>>>>>>   SDR: disable
>>>>>>>>>>>>>>   DDR: disable
>>>>>>>>>>>>>>   QDR: enable
>>>>>>>>>>>>>>   FDR10: enable-request
>>>>>>>>>>>>>>   FDR: enable-request
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) #
>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9
>>>>>>>>>>>>>> phy-profile map hp-encl-isl
>>>>>>>>>>>>>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The driver ‘qib’ is loading fine.  As can be seen by the
>>>>>>>>>>>>>> ibstat output.  The ib_ipath is an older card.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The problem is the link is not coming up to init.  Like Hal
>>>>>>>>>>>>>> said the link should transition to “link up” without the SMs involvement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think you are on to something with the fact that it seems
>>>>>>>>>>>>>> like your switch ports are not configured to do QDR.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *From:* German Anders [mailto:ganders at despegar.com]
>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>>>>>>>>>>>>> *To:* Weiny, Ira
>>>>>>>>>>>>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes I've that file:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also I've done the install of libipathverbs:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # apt-get install libipathverbs-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But I try to load the ib_ipath module but I'm getting the
>>>>>>>>>>>>>> following error msg:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # modprobe ib_ipath
>>>>>>>>>>>>>> modprobe: ERROR: could not insert 'ib_ipath': Device or
>>>>>>>>>>>>>> resource busy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are a few issues for routing in that diagram but the
>>>>>>>>>>>>>> links should come up.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I assume there is some backplane between the blade servers
>>>>>>>>>>>>>> and the switch in that chassis?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Have you gotten libipathverbs installed?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In ipathverbs there is a serdes tuning script.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does your libipathverbs include that file?  If not try the
>>>>>>>>>>>>>> latest from github.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German
>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>>>>>>>>>>>>> *To:* Hal Rosenstock
>>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Hal,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the reply, I've attach a pdf with the diagram
>>>>>>>>>>>>>> topology, I don't know if this is the best way to go or if there's another
>>>>>>>>>>>>>> way to connect and setup the IB network, tips and suggestions will be very
>>>>>>>>>>>>>> appreciated, also the mezzanine cards are already installed on the blade
>>>>>>>>>>>>>> hosts:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # lspci
>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA
>>>>>>>>>>>>>> (rev 02)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi again German,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looks like you made some progress from yesterday as the qib
>>>>>>>>>>>>>> ports are now Polling rather than Disabled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But since they are Down, do you have them cabled to a switch
>>>>>>>>>>>>>> ? That should bring the links up and the port state will be Init. That is
>>>>>>>>>>>>>> the "starting" point.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You will also then need to be running SM to bring the ports
>>>>>>>>>>>>>> up to Active.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <
>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't know if this is the mailist list for this kind of
>>>>>>>>>>>>>> topic but I'm really new to IB and I've just install two SX6036G gateways
>>>>>>>>>>>>>> connected to each other through two ISL ports, then I've configured a
>>>>>>>>>>>>>> proxy-arp between both nodes (sm is disable on both gw's):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Load balancing algorithm: ib-base-ip
>>>>>>>>>>>>>> Number of Proxy-Arp interfaces: 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Proxy-ARP VIP
>>>>>>>>>>>>>> =============
>>>>>>>>>>>>>> Pra-group name: proxy-ha-group
>>>>>>>>>>>>>> HA VIP address: 10.xx.xx.xx/xx
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Active nodes:
>>>>>>>>>>>>>> ID                   State                IP
>>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>>> GWIB01               master               10.xx.xx.xx1
>>>>>>>>>>>>>> GWIB02               standby              10.xx.xx.xx2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*),
>>>>>>>>>>>>>> one connected to GWIB01 and the other connected to GWIB02. The SM is
>>>>>>>>>>>>>> configured locally on both SWIB01 & SWIB02 switches. So far so good, after
>>>>>>>>>>>>>> this config I setup a commodity server with a MLNX IB ADPT FDR to the
>>>>>>>>>>>>>> SWIB01 & SWIB02 switches, config the drivers, etc and then get it up &
>>>>>>>>>>>>>> running fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Finally I've setup a HP Enclosure with an internal IB SW
>>>>>>>>>>>>>> (then connect port 1 of the internal SW to GWIB01 - link is up but LLR
>>>>>>>>>>>>>> status is inactive), install one of the blades and I see the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>>     Number of ports: 2
>>>>>>>>>>>>>>     Firmware version:
>>>>>>>>>>>>>>     Hardware version: 2
>>>>>>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>>     Port 1:
>>>>>>>>>>>>>>         State: Down
>>>>>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>     Port 2:
>>>>>>>>>>>>>>         State: Down
>>>>>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I was wondering if maybe the SM is not being recognized on
>>>>>>>>>>>>>> the Blade system and that's why is not passing the Polling state, is that
>>>>>>>>>>>>>> possible? Or maybe is not possible to connect an ISL between the GW and the
>>>>>>>>>>>>>> HP internal SW so that the sm is available or maybe the inactive LLR is
>>>>>>>>>>>>>> causing this thing, any ideas? I thought about connecting
>>>>>>>>>>>>>> the ISL of the HP IB SW to the SWIB01 or SWIB02 instead of the GW's but I
>>>>>>>>>>>>>> don't have any available ports.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>>>>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/a9adabee/attachment.html>


More information about the Users mailing list