[ofa-general] GPFS node loses IB-connection
Koen Segers
koen.segers at vrt.be
Tue May 22 11:17:25 PDT 2007
On Tue, 2007-05-22 at 08:34 -0700, Scott Weitzenkamp (sweitzen) wrote:
> What server model and CPU model do you have?
cat /proc/cpuinfo
processor : 7
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8218
stepping : 2
cpu MHz : 2600.202
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy
bogomips : 5200.54
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
>
> This could be https://bugs.openfabrics.org//show_bug.cgi?id=229. Try
> setting RENICE_IB_MAD=yes in /etc/infiniband/openibd.conf, then reboot
> or run /etc/init.d/openibd restart, and see if that helps.
AHA, this is interesting. I'll do it tomorrow!
>
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>
>
>
> ______________________________________________________________
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> SEGERS Koen
> Sent: Tuesday, May 22, 2007 6:44 AM
> To: Ami Perlmutter; Shirley Ma
> Cc: general-bounces at lists.openfabrics.org;
> general at lists.openfabrics.org
> Subject: RE: [ofa-general] GPFS node loses IB-connection
>
>
>
> I did the iperf tests on servers with OFED-1.2-RC3.
>
>
>
> It also gives the same result. Actually, it is even worse:
> when the interface dies, it gets in PORT_INIT state, but it
> doesn’t go to PORT_ACTIVE again. At least not within 10
> minutes.
>
>
>
> I’ll give you the test script I ran:
>
>
>
> ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 5001 &
>
> ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 5002 &
>
> ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 5003 &
>
> ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 6001 &
>
> ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 6002 &
>
> ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 6003 &
>
> ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 7001 &
>
> ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 7002 &
>
> ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 7003 &
>
> ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 8001 &
>
> ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 8002 &
>
> ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
> -s -p 8003 &
>
>
>
> sleep 5
>
>
>
> for i in 14 15 16 17
>
> do
>
> ssh 10.224.158.111 LD_PRELOAD=libsdp.so
> SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))001 -t 120
> -d -P 5 &
>
> ssh 10.224.158.112 LD_PRELOAD=libsdp.so
> SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))002 -t 120
> -d -P 5 &
>
> ssh 10.224.158.113 LD_PRELOAD=libsdp.so
> SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))003 -t 120
> -d -P 5 &
>
> done
>
>
>
> Any ideas?
>
>
>
> Regards,
>
>
>
> Koen
>
>
> ______________________________________________________________
> Van: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] Namens SEGERS
> Koen
> Verzonden: dinsdag 22 mei 2007 10:55
> Aan: Ami Perlmutter; Shirley Ma
> CC: general-bounces at lists.openfabrics.org;
> general at lists.openfabrics.org
> Onderwerp: RE: [ofa-general] GPFS node loses IB-connection
>
>
>
>
> GPFS keeps its connection constantly open.
>
>
>
> We did some more tests with iperf:
>
> If we don’t run bidirectional tests, all connections keeps
> running smoothly. If we add bidirectional tests, it becomes
> unstable. Certainly if this is done on multiple nodes. Is this
> normal?
>
>
>
> The failed iperf tests give the same error in the switch log:
>
> May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
> Generate SM OUT_OF_SERVICE trap for
> GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71
>
> May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
> Generate SM DELETE_MC_GROUP trap for
> GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71
>
> May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
> Configuration caused by discovering removed ports
>
> May 22 08:15:00 topspin-120sc ib_sm.x[621]: %IB-6-INFO:
> Program switch port state to down,
> node=00:05:ad:00:00:0b:a2:cc, port= 6, due to non-responding
> CA
>
> May 22 08:15:00 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
> port down - port=1/6, type=ib4xTXP
>
> May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO:
> in portTblFindEntry() - IfIndex=70(1/6)
>
> May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO:
> cannot find entry - IfIndex=70(1/6)
>
> May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
> Configuration caused by discovering new ports
>
> May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
> Configuration caused by multicast membership change
>
> May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
> Generate SM IN_SERVICE trap for
> GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71
>
> May 22 08:15:05 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
> port up - port=1/6, type=ib4xTXP
>
> May 22 08:15:07 topspin-120sc ib_sm.x[632]: %IB-6-INFO:
> Generate SM CREATE_MC_GROUP trap for
> GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71
>
> May 22 08:15:08 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
> Configuration caused by multicast membership change
>
>
>
> RC3 is just installed. Results will follow soon.
>
>
>
> Regards,
>
>
>
> Koen
>
>
>
>
> ______________________________________________________________
> Van: Ami Perlmutter [mailto:amip at dev.mellanox.co.il]
> Verzonden: dinsdag 22 mei 2007 10:33
> Aan: Shirley Ma
> CC: SEGERS Koen; general-bounces at lists.openfabrics.org;
> general at lists.openfabrics.org
> Onderwerp: Re: [ofa-general] GPFS node loses IB-connection
>
>
>
>
> does the application constantly open and close connections?
>
> *** Disclaimer ***
>
> Vlaamse Radio- en Televisieomroep
> Auguste Reyerslaan 52, 1043 Brussel
>
> nv van publiek recht
> BTW BE 0244.142.664
> RPR Brussel
> http://www.vrt.be/disclaimer
>
>
> *** Disclaimer ***
>
> Vlaamse Radio- en Televisieomroep
> Auguste Reyerslaan 52, 1043 Brussel
>
> nv van publiek recht
> BTW BE 0244.142.664
> RPR Brussel
> http://www.vrt.be/disclaimer
>
*** Disclaimer ***
Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel
nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
More information about the general
mailing list