[Users] HP BLc QLogic 4X QDR IB Switch oddness

Andrei Mikhailovsky andrei at arhont.com
Fri May 24 16:12:59 PDT 2013


----- Original Message -----

From: "John Valdes" <valdes at anl.gov> 
To: "Andrei Mikhailovsky" <andrei at arhont.com> 
Cc: users at lists.openfabrics.org 
Sent: Friday, 24 May, 2013 8:46:18 PM 
Subject: Re: [Users] HP BLc QLogic 4X QDR IB Switch oddness 

Andrei, 

On Fri, May 24, 2013 at 01:51:00PM +0100, Andrei Mikhailovsky wrote: 
> 
> I was wondering if anyone on the mailing list has experienced an odd behaviour with HP BLc QLogic 4X QDR IB Switch (505958-B21)? [...] 
> 
> My problem is that every time I reboot the blade servers which are connected to the above blade switch the switch ports take ages to Activate. [...] 
> 
> However, the server which are connected to the same switch with a cable (not interconnected blade servers) do not have this issue and get port state Active in seconds every time. 

This last fact, that servers directly attached to the switch w/ a 
cable negotiate link immediately, I think clearly points to the blade 
servers and/or blade chassis as the source of the problem and not the 
switch. 


AM: it was my initial thoughts as well, but after trying two different switches and two different blade enclosures with the same results I thought otherwise and unsure. 



What's the physical topology of the IB network between the blade 
servers and the switch? I'm not familiar w/ the HP BLc, but I'm 
guessing each blade server has an internal IB Mezzanine HCA which 
connects to a chassis backplane, correct? How does the IB from the 
backplane connect to the switch? Is there specialized connector 
between them, or does it use a standard IB cable (possibly more than 
one)? Does the chassis backplane have an IB switch chip? 

AM: I am not really sure. The servers do have the IB mezzanine card and from what i've read it is a pci-e card. I am unsure how the blade servers are connected to the switch. I guess it's an internal HP/QLogic interconnect. 



Some random thoughts/questions: 

1) Where is the subnet manager running? What routing protocol is it 
using (eg, minhop, updown, etc)? 



AM: subnet manager is currently running from one of the servers connected externally to the switch via the IB cable. I did try to run subnet manager from the actual blade servers and the behaviour did not change. 




2) When rebooting the blade servers, are you rebooting all the ones in 
the same chassis at the same time? If you stagger the reboot of 
each server, or if you just reboot one server, does it still take a 
long time for it to negotiate link? 



AM: tried rebooting one by one and all at the same time. No difference in results between the tries. 




3) The problem could be a firmware issue; since the directly attahed 
servers and the blade servers must have different HCAs (the former 
standard PCI HCAs and the latter mezzanine HCAs I'm assuming), that 
suggests they'll have different firmware. Unforutnately, you say 
you're running the latest firmware already. 


AM: from what I can tell, the IB Mezz hca doesn't have a firmware upgrade option. the firware shown is 0.0.0 and when I run the ofed driver install it tells me that my hca doesn't need a firmware upgrade. The switch is running the latest firmware from QLogic / Intel website. I think it's 7.0.0.35. 

Thanks for your help and ideas! 


John 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130525/94b7d5ba/attachment.html>


More information about the Users mailing list