[ofa-general] GPFS node loses IB-connection
Scott Weitzenkamp (sweitzen)
sweitzen at cisco.com
Tue May 22 08:34:24 PDT 2007
What server model and CPU model do you have?
This could be https://bugs.openfabrics.org//show_bug.cgi?id=229. Try
setting RENICE_IB_MAD=yes in /etc/infiniband/openibd.conf, then reboot
or run /etc/init.d/openibd restart, and see if that helps.
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
________________________________
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen
Sent: Tuesday, May 22, 2007 6:44 AM
To: Ami Perlmutter; Shirley Ma
Cc: general-bounces at lists.openfabrics.org;
general at lists.openfabrics.org
Subject: RE: [ofa-general] GPFS node loses IB-connection
I did the iperf tests on servers with OFED-1.2-RC3.
It also gives the same result. Actually, it is even worse: when
the interface dies, it gets in PORT_INIT state, but it doesn't go to
PORT_ACTIVE again. At least not within 10 minutes.
I'll give you the test script I ran:
ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5001 &
ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5002 &
ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 5003 &
ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6001 &
ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6002 &
ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 6003 &
ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7001 &
ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7002 &
ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 7003 &
ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8001 &
ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8002 &
ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf
-s -p 8003 &
sleep 5
for i in 14 15 16 17
do
ssh 10.224.158.111 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))001 -t 120 -d -P 5 &
ssh 10.224.158.112 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))002 -t 120 -d -P 5 &
ssh 10.224.158.113 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK
iperf -c 192.168.2.$i -p $((i-9))003 -t 120 -d -P 5 &
done
Any ideas?
Regards,
Koen
________________________________
Van: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] Namens SEGERS Koen
Verzonden: dinsdag 22 mei 2007 10:55
Aan: Ami Perlmutter; Shirley Ma
CC: general-bounces at lists.openfabrics.org;
general at lists.openfabrics.org
Onderwerp: RE: [ofa-general] GPFS node loses IB-connection
GPFS keeps its connection constantly open.
We did some more tests with iperf:
If we don't run bidirectional tests, all connections keeps
running smoothly. If we add bidirectional tests, it becomes unstable.
Certainly if this is done on multiple nodes. Is this normal?
The failed iperf tests give the same error in the switch log:
May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM OUT_OF_SERVICE trap for
GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71
May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM DELETE_MC_GROUP trap for
GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71
May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by discovering removed ports
May 22 08:15:00 topspin-120sc ib_sm.x[621]: %IB-6-INFO: Program
switch port state to down, node=00:05:ad:00:00:0b:a2:cc, port= 6, due to
non-responding CA
May 22 08:15:00 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
port down - port=1/6, type=ib4xTXP
May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO: in
portTblFindEntry() - IfIndex=70(1/6)
May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO:
cannot find entry - IfIndex=70(1/6)
May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by discovering new ports
May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by multicast membership change
May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO: Generate
SM IN_SERVICE trap for
GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71
May 22 08:15:05 topspin-120sc port_mgr.x[497]: %PORT-6-INFO:
port up - port=1/6, type=ib4xTXP
May 22 08:15:07 topspin-120sc ib_sm.x[632]: %IB-6-INFO: Generate
SM CREATE_MC_GROUP trap for
GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71
May 22 08:15:08 topspin-120sc ib_sm.x[618]: %IB-6-INFO:
Configuration caused by multicast membership change
RC3 is just installed. Results will follow soon.
Regards,
Koen
________________________________
Van: Ami Perlmutter [mailto:amip at dev.mellanox.co.il]
Verzonden: dinsdag 22 mei 2007 10:33
Aan: Shirley Ma
CC: SEGERS Koen; general-bounces at lists.openfabrics.org;
general at lists.openfabrics.org
Onderwerp: Re: [ofa-general] GPFS node loses IB-connection
does the application constantly open and close connections?
*** Disclaimer ***
Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel
nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
*** Disclaimer ***
Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel
nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070522/58c64a65/attachment.html>
More information about the general
mailing list