[Users] IB topology config and polling state

Hal Rosenstock hal.rosenstock at gmail.com
Tue Oct 13 11:03:24 PDT 2015


You should reset the counters with perfquery -R as symbol errors is max'd
out and IB counters are sticky rather than wrap around like IETF counters.

On Tue, Oct 13, 2015 at 2:01 PM, Hal Rosenstock <hal.rosenstock at gmail.com>
wrote:

> SymbolErrors, PortRcvErrors, LinkErrorRecovery are all indicative of a bad
> physical connection. Usually this is cables but this is a backplane
> connection. Would you reseat the boards and make sure they're solidly in
> their backplane connectors ?
>
> On Tue, Oct 13, 2015 at 1:58 PM, German Anders <ganders at despegar.com>
> wrote:
>
>> I see many errors increasing:
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........0
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................20904
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................166
>> PortRcvData:.....................167
>> PortXmitPkts:....................3
>> PortRcvPkts:.....................3
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........4
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................37474
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................260
>> PortRcvData:.....................262
>> PortXmitPkts:....................5
>> PortRcvPkts:.....................5
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........4
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................60240
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................449
>> PortRcvData:.....................452
>> PortXmitPkts:....................9
>> PortRcvPkts:.....................9
>> PortXmitWait:....................0
>>
>> *after changing the speed port to 7:*
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........2
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................1568
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................0
>> PortRcvData:.....................0
>> PortXmitPkts:....................0
>> PortRcvPkts:.....................0
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........2
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................11356
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................166
>> PortRcvData:.....................167
>> PortXmitPkts:....................3
>> PortRcvPkts:.....................3
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........3
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................44675
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................261
>> PortRcvData:.....................284
>> PortXmitPkts:....................5
>> PortRcvPkts:.....................6
>> PortXmitWait:....................0
>>
>>
>>
>> *German* <ganders at despegar.com>
>>
>> 2015-10-13 14:26 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>
>>> Speed 4 is not valid as an enabled speed. It probably is speed 7 (SDR,
>>> DDR, or QDR) . Since qib is QDR (only), espeed shouldn't be used. QDR only
>>> is non standard which is why it says "IBA extension".
>>>
>>> As to why the link comes up and then goes down, there are a number of
>>> possible reasons. What does perfquery 29 28 say ? Are there some error
>>> counters increasing there ? If so, which ones ?
>>>
>>> On Tue, Oct 13, 2015 at 1:09 PM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>>> Ok, some good news! I've find the way to get LinkUp on the blade
>>>> port... but.. not for so long :S, it seems that after making the following
>>>> changes the ports at the end come up...but then in a couple of minutes goes
>>>> down again:
>>>>
>>>> - Move the Encl IB SW to Bay7
>>>> - Move the Blade Mezz cards to slot 3
>>>>
>>>> Then for ex:
>>>>
>>>> Blade IB SW - LID: 29
>>>> Blade    4    IB-SW-PORT    28
>>>>
>>>> 1)
>>>>
>>>> # ibportstate -L 29 28
>>>> Switch PortInfo:
>>>> # Port info: Lid 29 port 28
>>>> LinkState:.......................Down
>>>> PhysLinkState:...................Polling
>>>> Lid:.............................75
>>>> SMLid:...........................2328
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedActive:.................2.5 Gbps
>>>>
>>>>
>>>> 2)
>>>>
>>>> # ibportstate -L 29 28 disable
>>>> # ibportstate -L 29 28 speed 4
>>>> # ibportstate -L 29 28 espeed 4
>>>> # ibportstate -L 29 28 smlid 2
>>>> # ibportstate -L 29 28 enable
>>>>
>>>> 3)
>>>>
>>>> # ibportstate -L 29 28
>>>> Switch PortInfo:
>>>> # Port info: Lid 29 port 28
>>>> LinkState:.......................Active
>>>> PhysLinkState:...................LinkUp
>>>> Lid:.............................75
>>>> SMLid:...........................2328
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedActive:.................10.0 Gbps
>>>> Peer PortInfo:
>>>> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,28 port 1
>>>> LinkState:.......................Active
>>>> PhysLinkState:...................LinkUp
>>>> Lid:.............................30
>>>> SMLid:...........................2
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............10.0 Gbps (IBA extension)
>>>> LinkSpeedEnabled:................10.0 Gbps (IBA extension)
>>>> LinkSpeedActive:.................10.0 Gbps
>>>> Mkey:............................<not displayed>
>>>> MkeyLeasePeriod:.................0
>>>> ProtectBits:.....................0
>>>>
>>>>
>>>> # ibstat
>>>> CA 'qib0'
>>>>     CA type: InfiniPath_QMH7342
>>>>     Number of ports: 2
>>>>     Firmware version:
>>>>     Hardware version: 2
>>>>     Node GUID: 0x0011750000791fec
>>>>     System image GUID: 0x0011750000791fec
>>>>     Port 1:
>>>>         State: *Active*
>>>>         Physical state: *LinkUp*
>>>>         Rate: 40
>>>>         Base lid: 30
>>>>         LMC: 0
>>>>         SM lid: 2
>>>>         Capability mask: 0x0761086a
>>>>         Port GUID: 0x0011750000791fec
>>>>         Link layer: InfiniBand
>>>>     Port 2:
>>>>         State: Down
>>>>         Physical state: Polling
>>>>         Rate: 40
>>>>         Base lid: 65535
>>>>         LMC: 0
>>>>         SM lid: 65535
>>>>         Capability mask: 0x0761086a
>>>>         Port GUID: 0x0011750000791fed
>>>>         Link layer: InfiniBand
>>>>
>>>>
>>>> a couple of minutes then...
>>>>
>>>> # ibportstate -L 29 28
>>>> Switch PortInfo:
>>>> # Port info: Lid 29 port 28
>>>> LinkState:.......................Down
>>>> PhysLinkState:...................Polling
>>>> Lid:.............................75
>>>> SMLid:...........................2328
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedActive:.................2.5 Gbps
>>>>
>>>>
>>>> *German* <ganders at despegar.com>
>>>>
>>>> 2015-10-13 12:02 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>
>>>>> Can anyone that's using the HP BLc 4X QDR IB Switch tell me in what
>>>>> interconnect bay is configured the switch? I've that in bay3 & mezz 1, I'll
>>>>> try to move to bay7 & mezz 3 and see if at least something changed or not,
>>>>> any useful info will be really appreciated.
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> *German* <ganders at despegar.com>
>>>>>
>>>>> 2015-10-13 10:51 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>>
>>>>>> yes, but the switch does not come with an external interface... just
>>>>>> the i2c port :S
>>>>>>
>>>>>>
>>>>>>
>>>>>> *German* <ganders at despegar.com>
>>>>>>
>>>>>> 2015-10-13 10:49 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>>>>
>>>>>>> I don't really know but I was wondering about whether your hardware
>>>>>>> supports this.
>>>>>>>
>>>>>>> HPOA_2 shows internal interface to OA is absent as is external
>>>>>>> Ethernet interface.
>>>>>>>
>>>>>>> On Tue, Oct 13, 2015 at 9:31 AM, German Anders <ganders at despegar.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> when trying to assign the ip addr to the interconnect bay it seems
>>>>>>>> that it does not 'apply' the change, and the ip addr is not displayed as
>>>>>>>> 'configured' so there's no way to enter from outside. ideas?
>>>>>>>>
>>>>>>>> *German*
>>>>>>>>
>>>>>>>> 2015-10-13 10:15 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com
>>>>>>>> >:
>>>>>>>>
>>>>>>>>> What about the 3 critical errors ? What are they ?
>>>>>>>>>
>>>>>>>>> On Tue, Oct 13, 2015 at 9:13 AM, German Anders <
>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>
>>>>>>>>>> I've try that and in fact I try to put the IP ADDR like this:
>>>>>>>>>>
>>>>>>>>>> SET EBIPA INTERCONNECT XX.XX.XX.XX XXX.XXX.XXX.XXX 3
>>>>>>>>>> SET EBIPA INTERCONNECT GATEWAY XX.XX.XX.XX 3
>>>>>>>>>> SET EBIPA INTERCONNECT DOMAIN "xxxxxx.net" 3
>>>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>>>> SET EBIPA INTERCONNECT NTP PRIMARY NONE 3
>>>>>>>>>> SET EBIPA INTERCONNECT NTP SECONDARY NONE 3
>>>>>>>>>> ENABLE EBIPA INTERCONNECT 3
>>>>>>>>>>
>>>>>>>>>> SAVE EBIPA
>>>>>>>>>>
>>>>>>>>>> But i'm not getting any ip response, also I've try many diff ip addr with no luck...if i put that ip to one of the blades it works fine, but not to the interconnect bay :( any other idea?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *German* <ganders at despegar.com>
>>>>>>>>>>
>>>>>>>>>> 2015-10-13 10:01 GMT-03:00 Hal Rosenstock <
>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Looks like there are 3 critical errors in system status. Did you
>>>>>>>>>>> look at these ?
>>>>>>>>>>>
>>>>>>>>>>> I don't know if you've seen this but there is some info on
>>>>>>>>>>> configuring the management IPs in
>>>>>>>>>>> http://h10032.www1.hp.com/ctg/Manual/c00814176.pdf
>>>>>>>>>>>
>>>>>>>>>>> Have you looked at/tried the command line interface ?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 13, 2015 at 8:28 AM, German Anders <
>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Hal,
>>>>>>>>>>>>
>>>>>>>>>>>> It does not allow me to setup an IP ADDR to the Internal SW so
>>>>>>>>>>>> I can't access from outside, except from the tools that I mentioned before,
>>>>>>>>>>>> also it doesn't allow me to access through serial connection from inside
>>>>>>>>>>>> the enclosure. I've attach some screen-shots about the connectivity.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *German*
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-10-13 9:13 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi German,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are the cards in the correct bays and slots ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have the HP Onboard Administrator tool ? What does it
>>>>>>>>>>>>> say about internal connectivity ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 13, 2015 at 7:44 AM, German Anders <
>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ira,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've some HP documentation but it quite short, also it
>>>>>>>>>>>>>> doesn't describe any 'config' or 'impl' steps in order to get the internal
>>>>>>>>>>>>>> switch up and running. The version of SW that came with enclosure does not
>>>>>>>>>>>>>> had any management module at all, so it depends on external management.
>>>>>>>>>>>>>> During weekend I've found a way to upgrade the firmware of the HP switch
>>>>>>>>>>>>>> with the following command (mlxburn -d lid-0x001D -fw fw-IS4.mlx), and the
>>>>>>>>>>>>>> I've run (flint -d /dev/mst/SW_MT48438_0x2c902004b0918_lid-0x001D dc
>>>>>>>>>>>>>> /home/ceph/HPIBSW.INI) and found the following inside that file:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [PS_INFO]
>>>>>>>>>>>>>> Name = 489184-B21
>>>>>>>>>>>>>> Description = HP BLc 4X QDR IB Switch
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [ADAPTER]
>>>>>>>>>>>>>> PSID = HP_0100000009
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [IB_TO_HW_MAP]
>>>>>>>>>>>>>> PORT1=14
>>>>>>>>>>>>>> PORT2=15
>>>>>>>>>>>>>> PORT3=16
>>>>>>>>>>>>>> PORT4=17
>>>>>>>>>>>>>> PORT5=18
>>>>>>>>>>>>>> PORT6=12
>>>>>>>>>>>>>> PORT7=11
>>>>>>>>>>>>>> PORT8=10
>>>>>>>>>>>>>> PORT9=9
>>>>>>>>>>>>>> PORT10=8
>>>>>>>>>>>>>> PORT11=7
>>>>>>>>>>>>>> PORT12=6
>>>>>>>>>>>>>> PORT13=5
>>>>>>>>>>>>>> PORT14=4
>>>>>>>>>>>>>> PORT15=3
>>>>>>>>>>>>>> PORT16=2
>>>>>>>>>>>>>> PORT17=20
>>>>>>>>>>>>>> PORT18=22
>>>>>>>>>>>>>> PORT19=24
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PORT20=26
>>>>>>>>>>>>>> PORT21=28
>>>>>>>>>>>>>> PORT22=30
>>>>>>>>>>>>>> PORT23=35
>>>>>>>>>>>>>> PORT24=33
>>>>>>>>>>>>>> PORT25=21
>>>>>>>>>>>>>> PORT26=23
>>>>>>>>>>>>>> PORT27=25
>>>>>>>>>>>>>> PORT28=27
>>>>>>>>>>>>>> PORT29=29
>>>>>>>>>>>>>> PORT30=36
>>>>>>>>>>>>>> PORT31=34
>>>>>>>>>>>>>> PORT32=32
>>>>>>>>>>>>>> PORT33=1
>>>>>>>>>>>>>> PORT34=13
>>>>>>>>>>>>>> PORT35=19
>>>>>>>>>>>>>> PORT36=31
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [unused_ports]
>>>>>>>>>>>>>> hw_port1_not_in_use=1
>>>>>>>>>>>>>> hw_port13_not_in_use=1
>>>>>>>>>>>>>> hw_port19_not_in_use=1
>>>>>>>>>>>>>> hw_port31_not_in_use=1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't know if maybe there's some issue with the port
>>>>>>>>>>>>>> mapping, anyone had used this kind of switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The summary of the problem is correct, the connectivity
>>>>>>>>>>>>>> between the IB network (MLNX switches/gw) and the HP IB switch is working
>>>>>>>>>>>>>> since I was able to upgrade the firmare of the switch and get information
>>>>>>>>>>>>>> about it. But, the connection between the mezzanine cards of the blades and
>>>>>>>>>>>>>> the internal IB sw enclosure is not working at all. Note, that if I go to
>>>>>>>>>>>>>> the OA administration of the enclosure I can see the 'green' ports mapping
>>>>>>>>>>>>>> of each of the blades and the interconnection switch, so I'm guessing that
>>>>>>>>>>>>>> it should be working.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regarding the questions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1)      What type of switch is in the HP chassis?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *QLogic HP BLc 4X QDR IB Switch*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *PSID = HP_0100000009*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Image type:   FS2*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *FW ver:         7.4.3000*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Device ID:     48438*
>>>>>>>>>>>>>> *GUI:              0002c902004b0918*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2)      Do you have console access or http access to that
>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *No, since it didn't had any manage module mezzanine card
>>>>>>>>>>>>>> inside the switch, it only come with a i2c port. But, i can have access
>>>>>>>>>>>>>> through the mlxburn and flint tools from one host that's connected to the
>>>>>>>>>>>>>> ib network (outside the enclosure).*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3)      Does that switch have an SM in it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *No*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4)      What version of the kernel are you running with the
>>>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a.       I assume you are using the qib driver in that
>>>>>>>>>>>>>> kernel.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Ubuntu 14.04.3 LTS - kernel 3.18.20-031820-generic*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing”  Was
>>>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *No, since LLR is only supported between mlnx devices, the
>>>>>>>>>>>>>> ISL are up and working, since it's possible for me to query the switch*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *When trying to run the command:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *#
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds/usr/sbin/truescale-serdes.cmds: 100:
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds: Syntax error: "(" unexpected (expecting
>>>>>>>>>>>>>> "}")*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *# rpm -qa | grep libipathverbslibipathverbs-1.3-1.x86_64*
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>> 2015-10-13 2:14 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> German,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you have any documentation on the HP blade system?  And
>>>>>>>>>>>>>>> the switch which is in that system?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have to admit I have not followed everything in this
>>>>>>>>>>>>>>> thread regarding your configuration but it seems like you have some
>>>>>>>>>>>>>>> mellanox switches connected into an HP chassis which has both a switch and
>>>>>>>>>>>>>>> blades with qib (Truescale) cards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The connection from the mellanox switch to the “HP chassis
>>>>>>>>>>>>>>> switch” is linkup (active) but the connections to the individual qib HCAs
>>>>>>>>>>>>>>> are not even linkup.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is that a correct summary of the problem?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If so here are some questions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1)      What type of switch is in the HP chassis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2)      Do you have console access or http access to that
>>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3)      Does that switch have an SM in it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4)      What version of the kernel are you running with the
>>>>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> a.       I assume you are using the qib driver in that
>>>>>>>>>>>>>>> kernel.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing”  Was
>>>>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *Weiny,
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:31 PM
>>>>>>>>>>>>>>> *To:* Hal Rosenstock; German Anders
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Agree with Hal here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I’m not familiar with those blades/switches.  I’ll ask
>>>>>>>>>>>>>>> around.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* Hal Rosenstock [mailto:hal.rosenstock at gmail.com
>>>>>>>>>>>>>>> <hal.rosenstock at gmail.com>]
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:26 PM
>>>>>>>>>>>>>>> *To:* German Anders
>>>>>>>>>>>>>>> *Cc:* Weiny, Ira; users at lists.openfabrics.org
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's the gateway to the switch in the enclosure. It's the
>>>>>>>>>>>>>>> internal connectivity in the blade enclosure that's (physically) broken.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 4:24 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cabled
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the blade it's:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> vendid=0x2c9
>>>>>>>>>>>>>>> devid=0xbd36
>>>>>>>>>>>>>>> sysimgguid=0x2c902004b0918
>>>>>>>>>>>>>>> switchguid=0x2c902004b0918(2c902004b0918)
>>>>>>>>>>>>>>> Switch    32 "S-0002c902004b0918"        # "Infiniscale-IV
>>>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>>>> [1]    "S-e41d2d030031e9c1"[9]        #
>>>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" lid 24 4xQDR
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 17:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What are those HCAs cabled to or is it internal to the blade
>>>>>>>>>>>>>>> enclosure ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:24 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yeah, is there any command that I can run in order to change
>>>>>>>>>>>>>>> the port state on the remote switch? I mean everything looks good but in
>>>>>>>>>>>>>>> the hp blades still getting:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>>>     Number of ports: 2
>>>>>>>>>>>>>>>     Firmware version:
>>>>>>>>>>>>>>>     Hardware version: 2
>>>>>>>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>>>     Port 1:
>>>>>>>>>>>>>>>         State: *Down*
>>>>>>>>>>>>>>>         Physical state: *Polling*
>>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>>     Port 2:
>>>>>>>>>>>>>>>         State: *Down*
>>>>>>>>>>>>>>>         Physical state: *Polling*
>>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also on working hosts I only see devices from the local
>>>>>>>>>>>>>>> network, but didn't see any of the blades hca connections.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The screen shot looks good :-) SM brought the link up to
>>>>>>>>>>>>>>> active.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that the ibportstate command you gave was for switch
>>>>>>>>>>>>>>> port 0 of the Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, find attached an screenshot of the port information (#
>>>>>>>>>>>>>>> 9) the one that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also
>>>>>>>>>>>>>>> from one of the hosts that are connected to one of the SX6018F I can see
>>>>>>>>>>>>>>> the 'remote' HP IB SW:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # *ibnodes*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>> Switch    : 0x0002c902004b0918 ports 32 "Infiniscale-IV
>>>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>>>> Switch    : 0xe41d2d030031e9c1 ports 37
>>>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" enhanced port 0 lid 24 lmc 0
>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # *ibportstate -L 29 query*
>>>>>>>>>>>>>>> Switch PortInfo:
>>>>>>>>>>>>>>> # Port info: Lid 29 port 0
>>>>>>>>>>>>>>> LinkState:.......................Active
>>>>>>>>>>>>>>> PhysLinkState:...................LinkUp
>>>>>>>>>>>>>>> Lid:.............................29
>>>>>>>>>>>>>>> SMLid:...........................2
>>>>>>>>>>>>>>> LMC:.............................0
>>>>>>>>>>>>>>> LinkWidthSupported:..............1X or 4X
>>>>>>>>>>>>>>> LinkWidthEnabled:................1X or 4X
>>>>>>>>>>>>>>> LinkWidthActive:.................4X
>>>>>>>>>>>>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or
>>>>>>>>>>>>>>> 10.0 Gbps
>>>>>>>>>>>>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or
>>>>>>>>>>>>>>> 10.0 Gbps
>>>>>>>>>>>>>>> LinkSpeedActive:.................10.0 Gbps
>>>>>>>>>>>>>>> Mkey:............................<not displayed>
>>>>>>>>>>>>>>> MkeyLeasePeriod:.................0
>>>>>>>>>>>>>>> ProtectBits:.....................0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more thing hopefully before playing with the low level
>>>>>>>>>>>>>>> phy settings:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are you using known good cables ? Do you have FDR cables on
>>>>>>>>>>>>>>> the FDR <-> FDR links ? Cable lengths can matter as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Were the ports mapped to the phy profile shutdown when you
>>>>>>>>>>>>>>> changed this ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LLR is a proprietary Mellanox mechanism.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You might want 2 different profiles: one for the interfaces
>>>>>>>>>>>>>>> connected to other gateway interfaces (which are FDR (and FDR-10) capable
>>>>>>>>>>>>>>> and the other for the interfaces connecting to QDR (the older equipment in
>>>>>>>>>>>>>>> your network). By configuring the Switch-X interfaces to the appropriate
>>>>>>>>>>>>>>> possible speeds and disabling the proprietary mechanisms there, the link
>>>>>>>>>>>>>>> should not only come up but also this will occur faster than if FDR/FDR10
>>>>>>>>>>>>>>> are enabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I suspect that due to the Switch-X configuration that the
>>>>>>>>>>>>>>> links to the switch(es) in the HP enclosures do not negotiate properly (as
>>>>>>>>>>>>>>> shown by down rather than LinkUp).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Once you get all your links to INIT, negotiation has
>>>>>>>>>>>>>>> occurred and then it's time for SM to bring links to active.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since you have down links, the SM can't do anything about
>>>>>>>>>>>>>>> those.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I
>>>>>>>>>>>>>>> know that this kind of SW does not come with an embedded sm, but I don't
>>>>>>>>>>>>>>> know how to access any mgmt at all on this particularly switch, I mean for
>>>>>>>>>>>>>>> example to setup speed or anything like that, is possible to access through
>>>>>>>>>>>>>>> the chassis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 13:19 GMT-03:00 German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so, but when trying to configured the phy-profile on
>>>>>>>>>>>>>>> the interface in order to negotiate on QDR it failed to map the profile:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>>>> high-speed-ber
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Profile: high-speed-ber
>>>>>>>>>>>>>>>   --------
>>>>>>>>>>>>>>>   llr support ib-speed
>>>>>>>>>>>>>>>   SDR: disable
>>>>>>>>>>>>>>>   DDR: disable
>>>>>>>>>>>>>>>   QDR: disable
>>>>>>>>>>>>>>>   FDR10: enable-request
>>>>>>>>>>>>>>>   FDR: enable-request
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>>>> hp-encl-isl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Profile: hp-encl-isl
>>>>>>>>>>>>>>>   --------
>>>>>>>>>>>>>>>   llr support ib-speed
>>>>>>>>>>>>>>>   SDR: disable
>>>>>>>>>>>>>>>   DDR: disable
>>>>>>>>>>>>>>>   QDR: enable
>>>>>>>>>>>>>>>   FDR10: enable-request
>>>>>>>>>>>>>>>   FDR: enable-request
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) #
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9
>>>>>>>>>>>>>>> phy-profile map hp-encl-isl
>>>>>>>>>>>>>>> *% Cannot map profile hp-encl-isl to port:  1/9*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The driver ‘qib’ is loading fine.  As can be seen by the
>>>>>>>>>>>>>>> ibstat output.  The ib_ipath is an older card.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is the link is not coming up to init.  Like Hal
>>>>>>>>>>>>>>> said the link should transition to “link up” without the SMs involvement.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think you are on to something with the fact that it seems
>>>>>>>>>>>>>>> like your switch ports are not configured to do QDR.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* German Anders [mailto:ganders at despegar.com]
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>>>>>>>>>>>>>> *To:* Weiny, Ira
>>>>>>>>>>>>>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes I've that file:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also I've done the install of libipathverbs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # apt-get install libipathverbs-dev
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But I try to load the ib_ipath module but I'm getting the
>>>>>>>>>>>>>>> following error msg:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # modprobe ib_ipath
>>>>>>>>>>>>>>> modprobe: ERROR: could not insert 'ib_ipath': Device or
>>>>>>>>>>>>>>> resource busy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There are a few issues for routing in that diagram but the
>>>>>>>>>>>>>>> links should come up.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I assume there is some backplane between the blade servers
>>>>>>>>>>>>>>> and the switch in that chassis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have you gotten libipathverbs installed?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In ipathverbs there is a serdes tuning script.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does your libipathverbs include that file?  If not try the
>>>>>>>>>>>>>>> latest from github.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German
>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>>>>>>>>>>>>>> *To:* Hal Rosenstock
>>>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Hal,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the reply, I've attach a pdf with the diagram
>>>>>>>>>>>>>>> topology, I don't know if this is the best way to go or if there's another
>>>>>>>>>>>>>>> way to connect and setup the IB network, tips and suggestions will be very
>>>>>>>>>>>>>>> appreciated, also the mezzanine cards are already installed on the blade
>>>>>>>>>>>>>>> hosts:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # lspci
>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA
>>>>>>>>>>>>>>> (rev 02)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi again German,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like you made some progress from yesterday as the qib
>>>>>>>>>>>>>>> ports are now Polling rather than Disabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But since they are Down, do you have them cabled to a switch
>>>>>>>>>>>>>>> ? That should bring the links up and the port state will be Init. That is
>>>>>>>>>>>>>>> the "starting" point.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You will also then need to be running SM to bring the ports
>>>>>>>>>>>>>>> up to Active.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't know if this is the mailist list for this kind of
>>>>>>>>>>>>>>> topic but I'm really new to IB and I've just install two SX6036G gateways
>>>>>>>>>>>>>>> connected to each other through two ISL ports, then I've configured a
>>>>>>>>>>>>>>> proxy-arp between both nodes (sm is disable on both gw's):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Load balancing algorithm: ib-base-ip
>>>>>>>>>>>>>>> Number of Proxy-Arp interfaces: 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Proxy-ARP VIP
>>>>>>>>>>>>>>> =============
>>>>>>>>>>>>>>> Pra-group name: proxy-ha-group
>>>>>>>>>>>>>>> HA VIP address: 10.xx.xx.xx/xx
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Active nodes:
>>>>>>>>>>>>>>> ID                   State
>>>>>>>>>>>>>>> IP
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>>>> GWIB01               master               10.xx.xx.xx1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB02               standby              10.xx.xx.xx2
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*),
>>>>>>>>>>>>>>> one connected to GWIB01 and the other connected to GWIB02. The SM is
>>>>>>>>>>>>>>> configured locally on both SWIB01 & SWIB02 switches. So far so good, after
>>>>>>>>>>>>>>> this config I setup a commodity server with a MLNX IB ADPT FDR to the
>>>>>>>>>>>>>>> SWIB01 & SWIB02 switches, config the drivers, etc and then get it up &
>>>>>>>>>>>>>>> running fine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Finally I've setup a HP Enclosure with an internal IB SW
>>>>>>>>>>>>>>> (then connect port 1 of the internal SW to GWIB01 - link is up but LLR
>>>>>>>>>>>>>>> status is inactive), install one of the blades and I see the following:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>>>     CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>>>     Number of ports: 2
>>>>>>>>>>>>>>>     Firmware version:
>>>>>>>>>>>>>>>     Hardware version: 2
>>>>>>>>>>>>>>>     Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>>>     System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>>>     Port 1:
>>>>>>>>>>>>>>>         State: Down
>>>>>>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>>     Port 2:
>>>>>>>>>>>>>>>         State: Down
>>>>>>>>>>>>>>>         Physical state: Polling
>>>>>>>>>>>>>>>         Rate: 40
>>>>>>>>>>>>>>>         Base lid: 4660
>>>>>>>>>>>>>>>         LMC: 0
>>>>>>>>>>>>>>>         SM lid: 4660
>>>>>>>>>>>>>>>         Capability mask: 0x0761086a
>>>>>>>>>>>>>>>         Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>>>         Link layer: InfiniBand
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I was wondering if maybe the SM is not being recognized
>>>>>>>>>>>>>>> on the Blade system and that's why is not passing the Polling state, is
>>>>>>>>>>>>>>> that possible? Or maybe is not possible to connect an ISL between the GW
>>>>>>>>>>>>>>> and the HP internal SW so that the sm is available or maybe the inactive
>>>>>>>>>>>>>>> LLR is causing this thing, any ideas? I thought about
>>>>>>>>>>>>>>> connecting the ISL of the HP IB SW to the SWIB01 or SWIB02 instead of the
>>>>>>>>>>>>>>> GW's but I don't have any available ports.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>>>>>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151013/ad54223e/attachment.html>


More information about the Users mailing list