[Users] HP BLc QLogic 4X QDR IB Switch oddness

Andrei Mikhailovsky andrei at arhont.com
Fri May 24 08:18:26 PDT 2013


It's a weird problem indeed. I've tried swapping switches and blade enclosures, as well as cards. I thought that I was going mad. Nothing really helped and HP was no help either. That tells me what hardware vendor not to go for in your Infiniband solution. HP is usually good with support, but not this time. 

Andrei 
----- Original Message -----

From: "Hal Rosenstock" <hal.rosenstock at gmail.com> 
To: "Andrei Mikhailovsky" <andrei at arhont.com> 
Cc: users at lists.openfabrics.org 
Sent: Friday, 24 May, 2013 4:06:17 PM 
Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness 

Andrei, 


On Fri, May 24, 2013 at 10:13 AM, Andrei Mikhailovsky < andrei at arhont.com > wrote: 




Hal, 

The trouble is i've tried swapping the cards and even the IB switch, makes no difference. 

The link state changes that i've mentioned before keeps cycling prior to being activated. as i've mentioned it takes various amount of time before they are being activated. However, once active they stay active for months. there are no errors or any other problems once the link becomes active. 

The trouble is to become active it takes a while, which is not right ))) 




The (physical) link state cycling is indicative of some physical connectivity issue. If swapping cards doesn't work, I'm not sure what to tell you. Maybe it's a backplane issue but likely you can't swap the chassis. 

-- Hal 

<blockquote>



Andrei 



From: "Hal Rosenstock" < hal.rosenstock at gmail.com > 
To: "Andrei Mikhailovsky" < andrei at arhont.com > 
Cc: users at lists.openfabrics.org 
Sent: Friday, 24 May, 2013 2:47:48 PM 


Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness 

Andrei, 


On Fri, May 24, 2013 at 9:45 AM, Andrei Mikhailovsky < andrei at arhont.com > wrote: 

<blockquote>


Hal, 

The physical state changes every few seconds and goes from DOWN/Polling to DOWN/Disabled to DOWN/(something else) don't remember exactly. It just keeps cycling through them. I've never seen the ports in INIT state, which I think it should start with when there is a connectivity. 

</blockquote>


So there's some physical connectivity issue. The link is constantly trying to negotiate and come up to LinkUp but even when it gets there it doesn't stay there for long. 

I'd start by reseating the cards which are involved with this. Hopefully that will help to stabilize things. 

-- Hal 

<blockquote>



Thanks 


From: "Hal Rosenstock" < hal.rosenstock at gmail.com > 
To: "Andrei Mikhailovsky" < andrei at arhont.com > 
Cc: users at lists.openfabrics.org 
Sent: Friday, 24 May, 2013 2:11:24 PM 
Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness 



Hi Andrei, 


On Fri, May 24, 2013 at 8:51 AM, Andrei Mikhailovsky < andrei at arhont.com > wrote: 

<blockquote>


Hello guys, 

I was wondering if anyone on the mailing list has experienced an odd behaviour with HP BLc QLogic 4X QDR IB Switch (505958-B21)? I've tried to get HP to solve this issue, but it resulted to HP basically saying that they do not have Infiniband specialists who could help me. 

My problem is that every time I reboot the blade servers which are connected to the above blade switch the switch ports take ages to Activate. The time it takes when the servers have: 

State: Active 
Physical state: LinkUp 

is random and varies from several minutes to over 10 hours. I couldn't find a correlation or any consistency between the servers or switch ports. The same server could connect within minutes after a reboot, but could take hours following the next reboot. 

However, the server which are connected to the same switch with a cable (not interconnected blade servers) do not have this issue and get port state Active in seconds every time. 

</blockquote>

What is the physical state of these ports when they're not active (on both the switch and server side) ? Are they LinkUp or something else ? 

<blockquote>



I have installed the latest available firmware and I have tried chaging blade mezz cards, IB switch and the blade enclosure, different versions of OFED / opensm. It stays the same. 

Has anyone come across this kind of behaviour? 

</blockquote>


Have you tried reseating your (server) cards ? 

Also, would you comment on the topology ? How large a subnet is this ? 

-- Hal 

<blockquote>



Thanks 

Andrei 


_______________________________________________ 
Users mailing list 
Users at lists.openfabrics.org 
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users 


</blockquote>



</blockquote>



</blockquote>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130524/c58e1ab3/attachment.html>


More information about the Users mailing list