[Users] HP BLc QLogic 4X QDR IB Switch oddness

Weiny, Ira ira.weiny at intel.com
Fri May 24 09:16:53 PDT 2013


Andrei,

Could you tell us what HCA/Switch models you have?  I am _not_ the expert here but I’ll see if I can get someone to help you.

Ira

From: users-bounces at lists.openfabrics.org [mailto:users-bounces at lists.openfabrics.org] On Behalf Of Andrei Mikhailovsky
Sent: Friday, May 24, 2013 8:18 AM
To: Hal Rosenstock
Cc: users at lists.openfabrics.org
Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness

It's a weird problem indeed. I've tried swapping switches and blade enclosures, as well as cards. I thought that I was going mad. Nothing really helped and HP was no help either. That  tells me what hardware vendor not to go for in your Infiniband solution. HP is usually good with support, but not this time.

Andrei
________________________________
From: "Hal Rosenstock" <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>>
To: "Andrei Mikhailovsky" <andrei at arhont.com<mailto:andrei at arhont.com>>
Cc: users at lists.openfabrics.org<mailto:users at lists.openfabrics.org>
Sent: Friday, 24 May, 2013 4:06:17 PM
Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness

Andrei,
On Fri, May 24, 2013 at 10:13 AM, Andrei Mikhailovsky <andrei at arhont.com<mailto:andrei at arhont.com>> wrote:
Hal,

The trouble is i've tried swapping the cards and even the IB switch, makes no difference.

The link state changes that i've mentioned before keeps cycling prior to being activated. as i've mentioned it takes various amount of time before they are being activated. However, once active they stay active for months. there are no errors or any other problems once the link becomes active.

The trouble is to become active it takes a while, which is not right )))

The (physical) link state cycling is indicative of some physical connectivity issue. If swapping cards doesn't work, I'm not sure what to tell you. Maybe it's a backplane issue but likely you can't swap the chassis.

-- Hal


Andrei
________________________________
From: "Hal Rosenstock" <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>>
To: "Andrei Mikhailovsky" <andrei at arhont.com<mailto:andrei at arhont.com>>
Cc: users at lists.openfabrics.org<mailto:users at lists.openfabrics.org>
Sent: Friday, 24 May, 2013 2:47:48 PM

Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness

Andrei,
On Fri, May 24, 2013 at 9:45 AM, Andrei Mikhailovsky <andrei at arhont.com<mailto:andrei at arhont.com>> wrote:
Hal,

The physical state changes every few seconds and goes from DOWN/Polling to DOWN/Disabled to DOWN/(something else) don't remember exactly. It just keeps cycling through them. I've never seen the ports in INIT state, which I think it should start with when there is a connectivity.

So there's some physical connectivity issue. The link is constantly trying to negotiate and come up to LinkUp but even when it gets there it doesn't stay there for long.

I'd start by reseating the cards which are involved with this. Hopefully that will help to stabilize things.

-- Hal


Thanks
________________________________
From: "Hal Rosenstock" <hal.rosenstock at gmail.com<mailto:hal.rosenstock at gmail.com>>
To: "Andrei Mikhailovsky" <andrei at arhont.com<mailto:andrei at arhont.com>>
Cc: users at lists.openfabrics.org<mailto:users at lists.openfabrics.org>
Sent: Friday, 24 May, 2013 2:11:24 PM
Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness


Hi Andrei,
On Fri, May 24, 2013 at 8:51 AM, Andrei Mikhailovsky <andrei at arhont.com<mailto:andrei at arhont.com>> wrote:
Hello guys,

I was wondering if anyone on the mailing list has experienced an odd behaviour with HP BLc QLogic 4X QDR IB Switch (505958-B21)? I've tried to get HP to solve this issue, but it resulted to HP basically saying that they do not have Infiniband specialists who could help me.

My problem is that every time I reboot the blade servers which are connected to the above blade switch the switch ports take ages to Activate. The time it takes when the servers have:

State: Active
Physical state: LinkUp

is random and varies from several minutes to over 10 hours. I couldn't find a correlation or any consistency between the servers or switch ports. The same server could connect within minutes after a reboot, but could take hours following the next reboot.

However, the server which are connected to the same switch with a cable (not interconnected blade servers) do not have this issue and get port state Active in seconds every time.
What is the physical state of these ports when they're not active (on both the switch and server side) ? Are they LinkUp or something else ?


I have installed the latest available firmware and I have tried chaging blade mezz cards, IB switch and the blade enclosure, different versions of OFED / opensm. It stays the same.

Has anyone come across this kind of behaviour?

Have you tried reseating your (server) cards ?

Also, would you comment on the topology ? How large a subnet is this ?

-- Hal


Thanks

Andrei

_______________________________________________
Users mailing list
Users at lists.openfabrics.org<mailto:Users at lists.openfabrics.org>
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130524/dc6126d3/attachment.html>


More information about the Users mailing list