[Users] IB topology config and polling state
German Anders
ganders at despegar.com
Wed Oct 14 08:41:17 PDT 2015
already try that but still getting the same behavior, also I try to change
the speed to 7, but that didn't seems to be working either, the only way
that the port come up is if I set the port speed to 4, the do the smlid to
2, and finally enable the port, then it transition to linkup state for some
time and then goes down again... really rare condition... I don't know what
else to look at... any ideas?
*German* <ganders at despegar.com>
2015-10-13 15:01 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
> SymbolErrors, PortRcvErrors, LinkErrorRecovery are all indicative of a bad
> physical connection. Usually this is cables but this is a backplane
> connection. Would you reseat the boards and make sure they're solidly in
> their backplane connectors ?
>
> On Tue, Oct 13, 2015 at 1:58 PM, German Anders <ganders at despegar.com>
> wrote:
>
>> I see many errors increasing:
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........0
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................20904
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................166
>> PortRcvData:.....................167
>> PortXmitPkts:....................3
>> PortRcvPkts:.....................3
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........4
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................37474
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................260
>> PortRcvData:.....................262
>> PortXmitPkts:....................5
>> PortRcvPkts:.....................5
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........4
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................60240
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................449
>> PortRcvData:.....................452
>> PortXmitPkts:....................9
>> PortRcvPkts:.....................9
>> PortXmitWait:....................0
>>
>> *after changing the speed port to 7:*
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........2
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................1568
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................0
>> PortRcvData:.....................0
>> PortXmitPkts:....................0
>> PortRcvPkts:.....................0
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........2
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................11356
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................166
>> PortRcvData:.....................167
>> PortXmitPkts:....................3
>> PortRcvPkts:.....................3
>> PortXmitWait:....................0
>>
>> # perfquery 29 28
>> # Port counters: Lid 29 port 28 (CapMask: 0x1B01)
>> PortSelect:......................28
>> CounterSelect:...................0x0000
>> SymbolErrorCounter:..............65535
>> LinkErrorRecoveryCounter:........3
>> LinkDownedCounter:...............0
>> PortRcvErrors:...................44675
>> PortRcvRemotePhysicalErrors:.....0
>> PortRcvSwitchRelayErrors:........0
>> PortXmitDiscards:................0
>> PortXmitConstraintErrors:........0
>> PortRcvConstraintErrors:.........0
>> CounterSelect2:..................0x00
>> LocalLinkIntegrityErrors:........0
>> ExcessiveBufferOverrunErrors:....0
>> VL15Dropped:.....................0
>> PortXmitData:....................261
>> PortRcvData:.....................284
>> PortXmitPkts:....................5
>> PortRcvPkts:.....................6
>> PortXmitWait:....................0
>>
>>
>>
>> *German* <ganders at despegar.com>
>>
>> 2015-10-13 14:26 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>
>>> Speed 4 is not valid as an enabled speed. It probably is speed 7 (SDR,
>>> DDR, or QDR) . Since qib is QDR (only), espeed shouldn't be used. QDR only
>>> is non standard which is why it says "IBA extension".
>>>
>>> As to why the link comes up and then goes down, there are a number of
>>> possible reasons. What does perfquery 29 28 say ? Are there some error
>>> counters increasing there ? If so, which ones ?
>>>
>>> On Tue, Oct 13, 2015 at 1:09 PM, German Anders <ganders at despegar.com>
>>> wrote:
>>>
>>>> Ok, some good news! I've find the way to get LinkUp on the blade
>>>> port... but.. not for so long :S, it seems that after making the following
>>>> changes the ports at the end come up...but then in a couple of minutes goes
>>>> down again:
>>>>
>>>> - Move the Encl IB SW to Bay7
>>>> - Move the Blade Mezz cards to slot 3
>>>>
>>>> Then for ex:
>>>>
>>>> Blade IB SW - LID: 29
>>>> Blade 4 IB-SW-PORT 28
>>>>
>>>> 1)
>>>>
>>>> # ibportstate -L 29 28
>>>> Switch PortInfo:
>>>> # Port info: Lid 29 port 28
>>>> LinkState:.......................Down
>>>> PhysLinkState:...................Polling
>>>> Lid:.............................75
>>>> SMLid:...........................2328
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedActive:.................2.5 Gbps
>>>>
>>>>
>>>> 2)
>>>>
>>>> # ibportstate -L 29 28 disable
>>>> # ibportstate -L 29 28 speed 4
>>>> # ibportstate -L 29 28 espeed 4
>>>> # ibportstate -L 29 28 smlid 2
>>>> # ibportstate -L 29 28 enable
>>>>
>>>> 3)
>>>>
>>>> # ibportstate -L 29 28
>>>> Switch PortInfo:
>>>> # Port info: Lid 29 port 28
>>>> LinkState:.......................Active
>>>> PhysLinkState:...................LinkUp
>>>> Lid:.............................75
>>>> SMLid:...........................2328
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedActive:.................10.0 Gbps
>>>> Peer PortInfo:
>>>> # Port info: Lid 29 DR path slid 4; dlid 65535; 0,28 port 1
>>>> LinkState:.......................Active
>>>> PhysLinkState:...................LinkUp
>>>> Lid:.............................30
>>>> SMLid:...........................2
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............10.0 Gbps (IBA extension)
>>>> LinkSpeedEnabled:................10.0 Gbps (IBA extension)
>>>> LinkSpeedActive:.................10.0 Gbps
>>>> Mkey:............................<not displayed>
>>>> MkeyLeasePeriod:.................0
>>>> ProtectBits:.....................0
>>>>
>>>>
>>>> # ibstat
>>>> CA 'qib0'
>>>> CA type: InfiniPath_QMH7342
>>>> Number of ports: 2
>>>> Firmware version:
>>>> Hardware version: 2
>>>> Node GUID: 0x0011750000791fec
>>>> System image GUID: 0x0011750000791fec
>>>> Port 1:
>>>> State: *Active*
>>>> Physical state: *LinkUp*
>>>> Rate: 40
>>>> Base lid: 30
>>>> LMC: 0
>>>> SM lid: 2
>>>> Capability mask: 0x0761086a
>>>> Port GUID: 0x0011750000791fec
>>>> Link layer: InfiniBand
>>>> Port 2:
>>>> State: Down
>>>> Physical state: Polling
>>>> Rate: 40
>>>> Base lid: 65535
>>>> LMC: 0
>>>> SM lid: 65535
>>>> Capability mask: 0x0761086a
>>>> Port GUID: 0x0011750000791fed
>>>> Link layer: InfiniBand
>>>>
>>>>
>>>> a couple of minutes then...
>>>>
>>>> # ibportstate -L 29 28
>>>> Switch PortInfo:
>>>> # Port info: Lid 29 port 28
>>>> LinkState:.......................Down
>>>> PhysLinkState:...................Polling
>>>> Lid:.............................75
>>>> SMLid:...........................2328
>>>> LMC:.............................0
>>>> LinkWidthSupported:..............1X or 4X
>>>> LinkWidthEnabled:................1X or 4X
>>>> LinkWidthActive:.................4X
>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
>>>> LinkSpeedActive:.................2.5 Gbps
>>>>
>>>>
>>>> *German* <ganders at despegar.com>
>>>>
>>>> 2015-10-13 12:02 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>
>>>>> Can anyone that's using the HP BLc 4X QDR IB Switch tell me in what
>>>>> interconnect bay is configured the switch? I've that in bay3 & mezz 1, I'll
>>>>> try to move to bay7 & mezz 3 and see if at least something changed or not,
>>>>> any useful info will be really appreciated.
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> *German* <ganders at despegar.com>
>>>>>
>>>>> 2015-10-13 10:51 GMT-03:00 German Anders <ganders at despegar.com>:
>>>>>
>>>>>> yes, but the switch does not come with an external interface... just
>>>>>> the i2c port :S
>>>>>>
>>>>>>
>>>>>>
>>>>>> *German* <ganders at despegar.com>
>>>>>>
>>>>>> 2015-10-13 10:49 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com>:
>>>>>>
>>>>>>> I don't really know but I was wondering about whether your hardware
>>>>>>> supports this.
>>>>>>>
>>>>>>> HPOA_2 shows internal interface to OA is absent as is external
>>>>>>> Ethernet interface.
>>>>>>>
>>>>>>> On Tue, Oct 13, 2015 at 9:31 AM, German Anders <ganders at despegar.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> when trying to assign the ip addr to the interconnect bay it seems
>>>>>>>> that it does not 'apply' the change, and the ip addr is not displayed as
>>>>>>>> 'configured' so there's no way to enter from outside. ideas?
>>>>>>>>
>>>>>>>> *German*
>>>>>>>>
>>>>>>>> 2015-10-13 10:15 GMT-03:00 Hal Rosenstock <hal.rosenstock at gmail.com
>>>>>>>> >:
>>>>>>>>
>>>>>>>>> What about the 3 critical errors ? What are they ?
>>>>>>>>>
>>>>>>>>> On Tue, Oct 13, 2015 at 9:13 AM, German Anders <
>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>
>>>>>>>>>> I've try that and in fact I try to put the IP ADDR like this:
>>>>>>>>>>
>>>>>>>>>> SET EBIPA INTERCONNECT XX.XX.XX.XX XXX.XXX.XXX.XXX 3
>>>>>>>>>> SET EBIPA INTERCONNECT GATEWAY XX.XX.XX.XX 3
>>>>>>>>>> SET EBIPA INTERCONNECT DOMAIN "xxxxxx.net" 3
>>>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>>>> ADD EBIPA INTERCONNECT DNS 10.xx.xx.xx 3
>>>>>>>>>> SET EBIPA INTERCONNECT NTP PRIMARY NONE 3
>>>>>>>>>> SET EBIPA INTERCONNECT NTP SECONDARY NONE 3
>>>>>>>>>> ENABLE EBIPA INTERCONNECT 3
>>>>>>>>>>
>>>>>>>>>> SAVE EBIPA
>>>>>>>>>>
>>>>>>>>>> But i'm not getting any ip response, also I've try many diff ip addr with no luck...if i put that ip to one of the blades it works fine, but not to the interconnect bay :( any other idea?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *German* <ganders at despegar.com>
>>>>>>>>>>
>>>>>>>>>> 2015-10-13 10:01 GMT-03:00 Hal Rosenstock <
>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Looks like there are 3 critical errors in system status. Did you
>>>>>>>>>>> look at these ?
>>>>>>>>>>>
>>>>>>>>>>> I don't know if you've seen this but there is some info on
>>>>>>>>>>> configuring the management IPs in
>>>>>>>>>>> http://h10032.www1.hp.com/ctg/Manual/c00814176.pdf
>>>>>>>>>>>
>>>>>>>>>>> Have you looked at/tried the command line interface ?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 13, 2015 at 8:28 AM, German Anders <
>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Hal,
>>>>>>>>>>>>
>>>>>>>>>>>> It does not allow me to setup an IP ADDR to the Internal SW so
>>>>>>>>>>>> I can't access from outside, except from the tools that I mentioned before,
>>>>>>>>>>>> also it doesn't allow me to access through serial connection from inside
>>>>>>>>>>>> the enclosure. I've attach some screen-shots about the connectivity.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *German*
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-10-13 9:13 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi German,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are the cards in the correct bays and slots ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have the HP Onboard Administrator tool ? What does it
>>>>>>>>>>>>> say about internal connectivity ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 13, 2015 at 7:44 AM, German Anders <
>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ira,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've some HP documentation but it quite short, also it
>>>>>>>>>>>>>> doesn't describe any 'config' or 'impl' steps in order to get the internal
>>>>>>>>>>>>>> switch up and running. The version of SW that came with enclosure does not
>>>>>>>>>>>>>> had any management module at all, so it depends on external management.
>>>>>>>>>>>>>> During weekend I've found a way to upgrade the firmware of the HP switch
>>>>>>>>>>>>>> with the following command (mlxburn -d lid-0x001D -fw fw-IS4.mlx), and the
>>>>>>>>>>>>>> I've run (flint -d /dev/mst/SW_MT48438_0x2c902004b0918_lid-0x001D dc
>>>>>>>>>>>>>> /home/ceph/HPIBSW.INI) and found the following inside that file:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [PS_INFO]
>>>>>>>>>>>>>> Name = 489184-B21
>>>>>>>>>>>>>> Description = HP BLc 4X QDR IB Switch
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [ADAPTER]
>>>>>>>>>>>>>> PSID = HP_0100000009
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [IB_TO_HW_MAP]
>>>>>>>>>>>>>> PORT1=14
>>>>>>>>>>>>>> PORT2=15
>>>>>>>>>>>>>> PORT3=16
>>>>>>>>>>>>>> PORT4=17
>>>>>>>>>>>>>> PORT5=18
>>>>>>>>>>>>>> PORT6=12
>>>>>>>>>>>>>> PORT7=11
>>>>>>>>>>>>>> PORT8=10
>>>>>>>>>>>>>> PORT9=9
>>>>>>>>>>>>>> PORT10=8
>>>>>>>>>>>>>> PORT11=7
>>>>>>>>>>>>>> PORT12=6
>>>>>>>>>>>>>> PORT13=5
>>>>>>>>>>>>>> PORT14=4
>>>>>>>>>>>>>> PORT15=3
>>>>>>>>>>>>>> PORT16=2
>>>>>>>>>>>>>> PORT17=20
>>>>>>>>>>>>>> PORT18=22
>>>>>>>>>>>>>> PORT19=24
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PORT20=26
>>>>>>>>>>>>>> PORT21=28
>>>>>>>>>>>>>> PORT22=30
>>>>>>>>>>>>>> PORT23=35
>>>>>>>>>>>>>> PORT24=33
>>>>>>>>>>>>>> PORT25=21
>>>>>>>>>>>>>> PORT26=23
>>>>>>>>>>>>>> PORT27=25
>>>>>>>>>>>>>> PORT28=27
>>>>>>>>>>>>>> PORT29=29
>>>>>>>>>>>>>> PORT30=36
>>>>>>>>>>>>>> PORT31=34
>>>>>>>>>>>>>> PORT32=32
>>>>>>>>>>>>>> PORT33=1
>>>>>>>>>>>>>> PORT34=13
>>>>>>>>>>>>>> PORT35=19
>>>>>>>>>>>>>> PORT36=31
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [unused_ports]
>>>>>>>>>>>>>> hw_port1_not_in_use=1
>>>>>>>>>>>>>> hw_port13_not_in_use=1
>>>>>>>>>>>>>> hw_port19_not_in_use=1
>>>>>>>>>>>>>> hw_port31_not_in_use=1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't know if maybe there's some issue with the port
>>>>>>>>>>>>>> mapping, anyone had used this kind of switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The summary of the problem is correct, the connectivity
>>>>>>>>>>>>>> between the IB network (MLNX switches/gw) and the HP IB switch is working
>>>>>>>>>>>>>> since I was able to upgrade the firmare of the switch and get information
>>>>>>>>>>>>>> about it. But, the connection between the mezzanine cards of the blades and
>>>>>>>>>>>>>> the internal IB sw enclosure is not working at all. Note, that if I go to
>>>>>>>>>>>>>> the OA administration of the enclosure I can see the 'green' ports mapping
>>>>>>>>>>>>>> of each of the blades and the interconnection switch, so I'm guessing that
>>>>>>>>>>>>>> it should be working.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regarding the questions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) What type of switch is in the HP chassis?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *QLogic HP BLc 4X QDR IB Switch*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *PSID = HP_0100000009*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Image type: FS2*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *FW ver: 7.4.3000*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Device ID: 48438*
>>>>>>>>>>>>>> *GUI: 0002c902004b0918*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) Do you have console access or http access to that
>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *No, since it didn't had any manage module mezzanine card
>>>>>>>>>>>>>> inside the switch, it only come with a i2c port. But, i can have access
>>>>>>>>>>>>>> through the mlxburn and flint tools from one host that's connected to the
>>>>>>>>>>>>>> ib network (outside the enclosure).*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) Does that switch have an SM in it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *No*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4) What version of the kernel are you running with the
>>>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a. I assume you are using the qib driver in that
>>>>>>>>>>>>>> kernel.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Ubuntu 14.04.3 LTS - kernel 3.18.20-031820-generic*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing” Was
>>>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *No, since LLR is only supported between mlnx devices, the
>>>>>>>>>>>>>> ISL are up and working, since it's possible for me to query the switch*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *When trying to run the command:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *#
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds/usr/sbin/truescale-serdes.cmds: 100:
>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds: Syntax error: "(" unexpected (expecting
>>>>>>>>>>>>>> "}")*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *# rpm -qa | grep libipathverbslibipathverbs-1.3-1.x86_64*
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>> 2015-10-13 2:14 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> German,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you have any documentation on the HP blade system? And
>>>>>>>>>>>>>>> the switch which is in that system?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have to admit I have not followed everything in this
>>>>>>>>>>>>>>> thread regarding your configuration but it seems like you have some
>>>>>>>>>>>>>>> mellanox switches connected into an HP chassis which has both a switch and
>>>>>>>>>>>>>>> blades with qib (Truescale) cards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The connection from the mellanox switch to the “HP chassis
>>>>>>>>>>>>>>> switch” is linkup (active) but the connections to the individual qib HCAs
>>>>>>>>>>>>>>> are not even linkup.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is that a correct summary of the problem?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If so here are some questions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) What type of switch is in the HP chassis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Do you have console access or http access to that
>>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3) Does that switch have an SM in it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4) What version of the kernel are you running with the
>>>>>>>>>>>>>>> qib cards?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> a. I assume you are using the qib driver in that
>>>>>>>>>>>>>>> kernel.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At some point Hal spoke of “LLR being a Mellanox thing” Was
>>>>>>>>>>>>>>> that to solve the problem of connecting the “HP switch” to the Mellanox
>>>>>>>>>>>>>>> switch?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would like it if you could verify that the
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is being run?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also what version of libipathverbs do you have?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *Weiny,
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:31 PM
>>>>>>>>>>>>>>> *To:* Hal Rosenstock; German Anders
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Agree with Hal here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I’m not familiar with those blades/switches. I’ll ask
>>>>>>>>>>>>>>> around.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* Hal Rosenstock [mailto:hal.rosenstock at gmail.com
>>>>>>>>>>>>>>> <hal.rosenstock at gmail.com>]
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 1:26 PM
>>>>>>>>>>>>>>> *To:* German Anders
>>>>>>>>>>>>>>> *Cc:* Weiny, Ira; users at lists.openfabrics.org
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's the gateway to the switch in the enclosure. It's the
>>>>>>>>>>>>>>> internal connectivity in the blade enclosure that's (physically) broken.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 4:24 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cabled
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the blade it's:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> vendid=0x2c9
>>>>>>>>>>>>>>> devid=0xbd36
>>>>>>>>>>>>>>> sysimgguid=0x2c902004b0918
>>>>>>>>>>>>>>> switchguid=0x2c902004b0918(2c902004b0918)
>>>>>>>>>>>>>>> Switch 32 "S-0002c902004b0918" # "Infiniscale-IV
>>>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>>>> [1] "S-e41d2d030031e9c1"[9] #
>>>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" lid 24 4xQDR
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 17:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What are those HCAs cabled to or is it internal to the blade
>>>>>>>>>>>>>>> enclosure ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:24 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yeah, is there any command that I can run in order to change
>>>>>>>>>>>>>>> the port state on the remote switch? I mean everything looks good but in
>>>>>>>>>>>>>>> the hp blades still getting:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>>> CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>>> Number of ports: 2
>>>>>>>>>>>>>>> Firmware version:
>>>>>>>>>>>>>>> Hardware version: 2
>>>>>>>>>>>>>>> Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>>> System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>>> Port 1:
>>>>>>>>>>>>>>> State: *Down*
>>>>>>>>>>>>>>> Physical state: *Polling*
>>>>>>>>>>>>>>> Rate: 40
>>>>>>>>>>>>>>> Base lid: 4660
>>>>>>>>>>>>>>> LMC: 0
>>>>>>>>>>>>>>> SM lid: 4660
>>>>>>>>>>>>>>> Capability mask: 0x0761086a
>>>>>>>>>>>>>>> Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>>> Link layer: InfiniBand
>>>>>>>>>>>>>>> Port 2:
>>>>>>>>>>>>>>> State: *Down*
>>>>>>>>>>>>>>> Physical state: *Polling*
>>>>>>>>>>>>>>> Rate: 40
>>>>>>>>>>>>>>> Base lid: 4660
>>>>>>>>>>>>>>> LMC: 0
>>>>>>>>>>>>>>> SM lid: 4660
>>>>>>>>>>>>>>> Capability mask: 0x0761086a
>>>>>>>>>>>>>>> Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>>> Link layer: InfiniBand
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also on working hosts I only see devices from the local
>>>>>>>>>>>>>>> network, but didn't see any of the blades hca connections.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 16:21 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The screen shot looks good :-) SM brought the link up to
>>>>>>>>>>>>>>> active.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that the ibportstate command you gave was for switch
>>>>>>>>>>>>>>> port 0 of the Mellanox IS-4 switch in the QLogic HP BLc 4X QDR IB Switch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 3:06 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, find attached an screenshot of the port information (#
>>>>>>>>>>>>>>> 9) the one that makes the ISL to the QLogic HP BLc 4X QDR IB Switch, also
>>>>>>>>>>>>>>> from one of the hosts that are connected to one of the SX6018F I can see
>>>>>>>>>>>>>>> the 'remote' HP IB SW:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # *ibnodes*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>> Switch : 0x0002c902004b0918 ports 32 "Infiniscale-IV
>>>>>>>>>>>>>>> Mellanox Technologies" base port 0 *lid 29* lmc 0
>>>>>>>>>>>>>>> Switch : 0xe41d2d030031e9c1 ports 37
>>>>>>>>>>>>>>> "MF0;GWIB01:SX6036G/U1" enhanced port 0 lid 24 lmc 0
>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # *ibportstate -L 29 query*
>>>>>>>>>>>>>>> Switch PortInfo:
>>>>>>>>>>>>>>> # Port info: Lid 29 port 0
>>>>>>>>>>>>>>> LinkState:.......................Active
>>>>>>>>>>>>>>> PhysLinkState:...................LinkUp
>>>>>>>>>>>>>>> Lid:.............................29
>>>>>>>>>>>>>>> SMLid:...........................2
>>>>>>>>>>>>>>> LMC:.............................0
>>>>>>>>>>>>>>> LinkWidthSupported:..............1X or 4X
>>>>>>>>>>>>>>> LinkWidthEnabled:................1X or 4X
>>>>>>>>>>>>>>> LinkWidthActive:.................4X
>>>>>>>>>>>>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or
>>>>>>>>>>>>>>> 10.0 Gbps
>>>>>>>>>>>>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or
>>>>>>>>>>>>>>> 10.0 Gbps
>>>>>>>>>>>>>>> LinkSpeedActive:.................10.0 Gbps
>>>>>>>>>>>>>>> Mkey:............................<not displayed>
>>>>>>>>>>>>>>> MkeyLeasePeriod:.................0
>>>>>>>>>>>>>>> ProtectBits:.....................0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 16:00 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more thing hopefully before playing with the low level
>>>>>>>>>>>>>>> phy settings:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are you using known good cables ? Do you have FDR cables on
>>>>>>>>>>>>>>> the FDR <-> FDR links ? Cable lengths can matter as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:57 PM, Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Were the ports mapped to the phy profile shutdown when you
>>>>>>>>>>>>>>> changed this ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LLR is a proprietary Mellanox mechanism.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You might want 2 different profiles: one for the interfaces
>>>>>>>>>>>>>>> connected to other gateway interfaces (which are FDR (and FDR-10) capable
>>>>>>>>>>>>>>> and the other for the interfaces connecting to QDR (the older equipment in
>>>>>>>>>>>>>>> your network). By configuring the Switch-X interfaces to the appropriate
>>>>>>>>>>>>>>> possible speeds and disabling the proprietary mechanisms there, the link
>>>>>>>>>>>>>>> should not only come up but also this will occur faster than if FDR/FDR10
>>>>>>>>>>>>>>> are enabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I suspect that due to the Switch-X configuration that the
>>>>>>>>>>>>>>> links to the switch(es) in the HP enclosures do not negotiate properly (as
>>>>>>>>>>>>>>> shown by down rather than LinkUp).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Once you get all your links to INIT, negotiation has
>>>>>>>>>>>>>>> occurred and then it's time for SM to bring links to active.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since you have down links, the SM can't do anything about
>>>>>>>>>>>>>>> those.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:44 PM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Anyone had any experience with HP BLc 4X QDR IB Switch?? I
>>>>>>>>>>>>>>> know that this kind of SW does not come with an embedded sm, but I don't
>>>>>>>>>>>>>>> know how to access any mgmt at all on this particularly switch, I mean for
>>>>>>>>>>>>>>> example to setup speed or anything like that, is possible to access through
>>>>>>>>>>>>>>> the chassis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 13:19 GMT-03:00 German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think so, but when trying to configured the phy-profile on
>>>>>>>>>>>>>>> the interface in order to negotiate on QDR it failed to map the profile:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>>>> high-speed-ber
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Profile: high-speed-ber
>>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>>> llr support ib-speed
>>>>>>>>>>>>>>> SDR: disable
>>>>>>>>>>>>>>> DDR: disable
>>>>>>>>>>>>>>> QDR: disable
>>>>>>>>>>>>>>> FDR10: enable-request
>>>>>>>>>>>>>>> FDR: enable-request
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show phy-profile
>>>>>>>>>>>>>>> hp-encl-isl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Profile: hp-encl-isl
>>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>>> llr support ib-speed
>>>>>>>>>>>>>>> SDR: disable
>>>>>>>>>>>>>>> DDR: disable
>>>>>>>>>>>>>>> QDR: enable
>>>>>>>>>>>>>>> FDR10: enable-request
>>>>>>>>>>>>>>> FDR: enable-request
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) #
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # interface ib 1/9
>>>>>>>>>>>>>>> phy-profile map hp-encl-isl
>>>>>>>>>>>>>>> *% Cannot map profile hp-encl-isl to port: 1/9*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 13:17 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The driver ‘qib’ is loading fine. As can be seen by the
>>>>>>>>>>>>>>> ibstat output. The ib_ipath is an older card.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is the link is not coming up to init. Like Hal
>>>>>>>>>>>>>>> said the link should transition to “link up” without the SMs involvement.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think you are on to something with the fact that it seems
>>>>>>>>>>>>>>> like your switch ports are not configured to do QDR.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* German Anders [mailto:ganders at despegar.com]
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 9:05 AM
>>>>>>>>>>>>>>> *To:* Weiny, Ira
>>>>>>>>>>>>>>> *Cc:* Hal Rosenstock; users at lists.openfabrics.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes I've that file:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /usr/sbin/truescale-serdes.cmds
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also I've done the install of libipathverbs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # apt-get install libipathverbs-dev
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But I try to load the ib_ipath module but I'm getting the
>>>>>>>>>>>>>>> following error msg:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # modprobe ib_ipath
>>>>>>>>>>>>>>> modprobe: ERROR: could not insert 'ib_ipath': Device or
>>>>>>>>>>>>>>> resource busy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 12:54 GMT-03:00 Weiny, Ira <ira.weiny at intel.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There are a few issues for routing in that diagram but the
>>>>>>>>>>>>>>> links should come up.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I assume there is some backplane between the blade servers
>>>>>>>>>>>>>>> and the switch in that chassis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have you gotten libipathverbs installed?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In ipathverbs there is a serdes tuning script.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://github.com/01org/libipathverbs/blob/master/truescale-serdes.cmds
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does your libipathverbs include that file? If not try the
>>>>>>>>>>>>>>> latest from github.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ira
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *From:* users-bounces at lists.openfabrics.org [mailto:
>>>>>>>>>>>>>>> users-bounces at lists.openfabrics.org] *On Behalf Of *German
>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>> *Sent:* Wednesday, October 07, 2015 8:41 AM
>>>>>>>>>>>>>>> *To:* Hal Rosenstock
>>>>>>>>>>>>>>> *Cc:* users at lists.openfabrics.org
>>>>>>>>>>>>>>> *Subject:* Re: [Users] IB topology config and polling state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Hal,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the reply, I've attach a pdf with the diagram
>>>>>>>>>>>>>>> topology, I don't know if this is the best way to go or if there's another
>>>>>>>>>>>>>>> way to connect and setup the IB network, tips and suggestions will be very
>>>>>>>>>>>>>>> appreciated, also the mezzanine cards are already installed on the blade
>>>>>>>>>>>>>>> hosts:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # lspci
>>>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>>> 41:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA
>>>>>>>>>>>>>>> (rev 02)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-10-07 11:47 GMT-03:00 Hal Rosenstock <
>>>>>>>>>>>>>>> hal.rosenstock at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi again German,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like you made some progress from yesterday as the qib
>>>>>>>>>>>>>>> ports are now Polling rather than Disabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But since they are Down, do you have them cabled to a switch
>>>>>>>>>>>>>>> ? That should bring the links up and the port state will be Init. That is
>>>>>>>>>>>>>>> the "starting" point.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You will also then need to be running SM to bring the ports
>>>>>>>>>>>>>>> up to Active.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:37 AM, German Anders <
>>>>>>>>>>>>>>> ganders at despegar.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't know if this is the mailist list for this kind of
>>>>>>>>>>>>>>> topic but I'm really new to IB and I've just install two SX6036G gateways
>>>>>>>>>>>>>>> connected to each other through two ISL ports, then I've configured a
>>>>>>>>>>>>>>> proxy-arp between both nodes (sm is disable on both gw's):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB01 [proxy-ha-group: master] (config) # show proxy-arp ha
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Load balancing algorithm: ib-base-ip
>>>>>>>>>>>>>>> Number of Proxy-Arp interfaces: 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Proxy-ARP VIP
>>>>>>>>>>>>>>> =============
>>>>>>>>>>>>>>> Pra-group name: proxy-ha-group
>>>>>>>>>>>>>>> HA VIP address: 10.xx.xx.xx/xx
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Active nodes:
>>>>>>>>>>>>>>> ID State
>>>>>>>>>>>>>>> IP
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>>>> GWIB01 master 10.xx.xx.xx1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GWIB02 standby 10.xx.xx.xx2
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then I setup two SX6018F switches (*SWIB01* and *SWIB02*),
>>>>>>>>>>>>>>> one connected to GWIB01 and the other connected to GWIB02. The SM is
>>>>>>>>>>>>>>> configured locally on both SWIB01 & SWIB02 switches. So far so good, after
>>>>>>>>>>>>>>> this config I setup a commodity server with a MLNX IB ADPT FDR to the
>>>>>>>>>>>>>>> SWIB01 & SWIB02 switches, config the drivers, etc and then get it up &
>>>>>>>>>>>>>>> running fine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Finally I've setup a HP Enclosure with an internal IB SW
>>>>>>>>>>>>>>> (then connect port 1 of the internal SW to GWIB01 - link is up but LLR
>>>>>>>>>>>>>>> status is inactive), install one of the blades and I see the following:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # ibstat
>>>>>>>>>>>>>>> CA 'qib0'
>>>>>>>>>>>>>>> CA type: InfiniPath_QMH7342
>>>>>>>>>>>>>>> Number of ports: 2
>>>>>>>>>>>>>>> Firmware version:
>>>>>>>>>>>>>>> Hardware version: 2
>>>>>>>>>>>>>>> Node GUID: 0x0011750000791fec
>>>>>>>>>>>>>>> System image GUID: 0x0011750000791fec
>>>>>>>>>>>>>>> Port 1:
>>>>>>>>>>>>>>> State: Down
>>>>>>>>>>>>>>> Physical state: Polling
>>>>>>>>>>>>>>> Rate: 40
>>>>>>>>>>>>>>> Base lid: 4660
>>>>>>>>>>>>>>> LMC: 0
>>>>>>>>>>>>>>> SM lid: 4660
>>>>>>>>>>>>>>> Capability mask: 0x0761086a
>>>>>>>>>>>>>>> Port GUID: 0x0011750000791fec
>>>>>>>>>>>>>>> Link layer: InfiniBand
>>>>>>>>>>>>>>> Port 2:
>>>>>>>>>>>>>>> State: Down
>>>>>>>>>>>>>>> Physical state: Polling
>>>>>>>>>>>>>>> Rate: 40
>>>>>>>>>>>>>>> Base lid: 4660
>>>>>>>>>>>>>>> LMC: 0
>>>>>>>>>>>>>>> SM lid: 4660
>>>>>>>>>>>>>>> Capability mask: 0x0761086a
>>>>>>>>>>>>>>> Port GUID: 0x0011750000791fed
>>>>>>>>>>>>>>> Link layer: InfiniBand
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I was wondering if maybe the SM is not being recognized
>>>>>>>>>>>>>>> on the Blade system and that's why is not passing the Polling state, is
>>>>>>>>>>>>>>> that possible? Or maybe is not possible to connect an ISL between the GW
>>>>>>>>>>>>>>> and the HP internal SW so that the sm is available or maybe the inactive
>>>>>>>>>>>>>>> LLR is causing this thing, any ideas? I thought about
>>>>>>>>>>>>>>> connecting the ISL of the HP IB SW to the SWIB01 or SWIB02 instead of the
>>>>>>>>>>>>>>> GW's but I don't have any available ports.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *German*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>>>>>>> http://lists.openfabrics.org/mailman/listinfo/users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20151014/62f4b43a/attachment.html>
More information about the Users
mailing list