[ofa-general] GPFS node loses IB-connection

Koen Segers koen.segers at vrt.be
Tue May 22 15:34:32 PDT 2007


If I understand it wright, the switch is actually polling (=pinging) the
interfaces every 10s. This means that when the interface is handling
other traffic, the poll can fail and the port could be considered out of
service. My question is then: "How can the timeout be reached while
packets are being sent/received to/from the interface?"

Anyway, what timeout-value would you recommend for us? And why?

To recapitulate: these are the actions I'll take tomorrow
1) change the MAD niceness of the servers
2) change the timeout on the switches

Are these changes sufficient for the HCA's to keep their ports in
PORT_ACTIVE state?

Regards,

Koen

On Tue, 2007-05-22 at 12:59 -0700, Scott Weitzenkamp (sweitzen) wrote:
> Yes, you can tune it.  Here's an example via the switch CLI:
>  
> SFS-7000D(config)# ib sm subnet-prefix fe:80:00:00:00:00:00:00
> node-timeout <value>
> 
> The default is 10 seconds, it can be configured up to 2000 seconds.
> If a HCA is completely unresponsive for longer than the node-timeout
> value, then we consider that HCA out of service.
>  
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
>         
>         ______________________________________________________________
>         From: Shirley Ma [mailto:xma at us.ibm.com] 
>         Sent: Tuesday, May 22, 2007 11:30 AM
>         To: koen.segers at VRT.BE
>         Cc: Ami Perlmutter; general at lists.openfabrics.org;
>         general-bounces at lists.openfabrics.org; Scott Weitzenkamp
>         (sweitzen)
>         Subject: RE: [ofa-general] GPFS node loses IB-connection
>         
>         
>         
>         Koen,
>         
>         So it is most likely you hit the same bug as 229 (Scott
>         pointed out earlier). The same workaround might work for you
>         by renicing ib_mad as Scott suggested.
>         
>         I think this should be a SM query timeout tunable value in
>         Cisco SM. Am I right, Scott?
>         
>         Thanks
>         Shirley Ma
>         
>         
>         Inactive hide details for Koen Segers <koen.segers at VRT.BE>Koen
>         Segers <koen.segers at VRT.BE>
>         
>         
>                                         Koen Segers <koen.segers at VRT.BE> 
>                                         
>                                         05/22/07 11:14 AM 
>                                         Please respond to
>                                         koen.segers at VRT.BE
>                                         
>         
>                      To
>         
>         Shirley
>         Ma/Beaverton/IBM at IBMUS
>         
>                      cc
>         
>         Ami Perlmutter
>         <amip at dev.mellanox.co.il>, general at lists.openfabrics.org, general-bounces at lists.openfabrics.org
>         
>                 Subject
>         
>         RE:
>         [ofa-general]
>         GPFS node loses
>         IB-connection
>         
>         
>         
>         Hi,
>         
>         It is the Cisco SM. 
>         
>         SFS-7000P> show version
>         
>         
>         ================================================================================
>                                   System Version Information
>         ================================================================================
>                   system-version : SFS-7000P TopspinOS 2.9.0 releng
>         #147
>         10/25/2006 02:01:32
>                          contact : tac at cisco.com
>                             name : SFS-7000P
>                         location : 170 West Tasman Drive, San Jose, CA
>         95134
>                          up-time : 11(d):7(h):49(m):3(s)
>                      last-change : none
>                 last-config-save : none
>                           action : none
>                           result : none
>                        oper-mode : normal
>         
>         There is also a command that gives the SM version, but I can't
>         find it
>         right now. 
>         
>         On Tue, 2007-05-22 at 09:45 -0700, Shirley Ma wrote:
>         > Hello Koen,
>         > 
>         > From the switch log, it looks a SM issue to me. The node was
>         kicked
>         > out of the membership. Which SM you are using in your
>         fabric? 
>         > 
>         > Thanks
>         > Shirley Ma
>         > 
>         *** Disclaimer ***
>         
>         Vlaamse Radio- en Televisieomroep
>         Auguste Reyerslaan 52, 1043 Brussel
>         
>         nv van publiek recht
>         BTW BE 0244.142.664
>         RPR Brussel
>         http://www.vrt.be/disclaimer
>         
>         
>         
>         
>         
*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
 




More information about the general mailing list