[ofa-general] GPFS node loses IB-connection

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Tue May 22 08:34:24 PDT 2007


What server model and CPU model do you have?
 
This could be https://bugs.openfabrics.org//show_bug.cgi?id=229.  Try
setting RENICE_IB_MAD=yes in /etc/infiniband/openibd.conf, then reboot
or run /etc/init.d/openibd restart, and see if that helps.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 


________________________________

	From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen
	Sent: Tuesday, May 22, 2007 6:44 AM
	To: Ami Perlmutter; Shirley Ma
	Cc: general-bounces at lists.openfabrics.org;
general at lists.openfabrics.org
	Subject: RE: [ofa-general] GPFS node loses IB-connection
	
	

	I did the iperf tests on servers with OFED-1.2-RC3.

	 

	It also gives the same result. Actually, it is even worse: when
the interface dies, it gets in PORT_INIT state, but it doesn't go to
PORT_ACTIVE again. At least not within 10 minutes.

	 

	I'll give you the test script I ran:

	 

	ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5001 &

	ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5002 &

	ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5003 &

	ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6001 &

	ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6002 &

	ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6003 &

	ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7001 &

	ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7002 &

	ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7003 &

	ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8001 &

	ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8002 &

	ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8003 &

	 

	sleep 5

	 

	for i in 14 15 16 17

	do

	        ssh 10.224.158.111 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))001 -t 120 -d -P 5 &

	        ssh 10.224.158.112 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))002 -t 120 -d -P 5 &

	        ssh 10.224.158.113 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))003 -t 120 -d -P 5 &

	done

	 

	Any ideas?

	 

	Regards,

	 

	Koen

	
________________________________


	Van: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] Namens SEGERS Koen
	Verzonden: dinsdag 22 mei 2007 10:55
	Aan: Ami Perlmutter; Shirley Ma
	CC: general-bounces at lists.openfabrics.org;
general at lists.openfabrics.org
	Onderwerp: RE: [ofa-general] GPFS node loses IB-connection

	 

	GPFS keeps its connection constantly open.

	 

	We did some more tests with iperf:

	If we don't run bidirectional tests, all connections keeps
running smoothly. If we add bidirectional tests, it becomes unstable.
Certainly if this is done on multiple nodes. Is this normal?

	 

	The failed iperf tests give the same error in the switch log:

	May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM OUT_OF_SERVICE trap for
GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71

	May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM DELETE_MC_GROUP trap for
GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71

	May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by discovering removed ports

	May 22 08:15:00 topspin-120sc ib_sm.x[621]: %IB-6-INFO: Program
switch port state to down, node=00:05:ad:00:00:0b:a2:cc, port= 6, due to
non-responding CA

	May 22 08:15:00 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
port down - port=1/6, type=ib4xTXP

	May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO: in
portTblFindEntry() - IfIndex=70(1/6)

	May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO:
cannot find entry - IfIndex=70(1/6)

	May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by discovering new ports

	May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by multicast membership change

	May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM IN_SERVICE trap for
GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71

	May 22 08:15:05 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
port up - port=1/6, type=ib4xTXP

	May 22 08:15:07 topspin-120sc ib_sm.x[632]: %IB-6-INFO: Generate
SM CREATE_MC_GROUP trap for
GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71

	May 22 08:15:08 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by multicast membership change

	 

	RC3 is just installed. Results will follow soon.

	 

	Regards,

	 

	Koen

	 

	
________________________________


	Van: Ami Perlmutter [mailto:amip at dev.mellanox.co.il] 
	Verzonden: dinsdag 22 mei 2007 10:33
	Aan: Shirley Ma
	CC: SEGERS Koen; general-bounces at lists.openfabrics.org;
general at lists.openfabrics.org
	Onderwerp: Re: [ofa-general] GPFS node loses IB-connection

	 

	does the application constantly open and close connections? 

	*** Disclaimer ***
	
	Vlaamse Radio- en Televisieomroep
	Auguste Reyerslaan 52, 1043 Brussel
	
	nv van publiek recht
	BTW BE 0244.142.664
	RPR Brussel
	http://www.vrt.be/disclaimer

	*** Disclaimer ***
	
	Vlaamse Radio- en Televisieomroep
	Auguste Reyerslaan 52, 1043 Brussel
	
	nv van publiek recht
	BTW BE 0244.142.664
	RPR Brussel
	http://www.vrt.be/disclaimer
	
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070522/58c64a65/attachment.html>


More information about the general mailing list