[ofa-general] GPFS node loses IB-connection
SEGERS Koen
Koen.SEGERS at VRT.BE
Mon May 21 11:50:31 PDT 2007
The same as in dmesg.
The output for the failing node:
May 18 13:02:51 gpfswhbe1s1 mmfs: Error=MMFS_PHOENIX, ID=0xAB429E38, Tag=4997901: Reason code 668 Failure Reason Lost membership in cluster enterprise.universe. Unmounting file systems.
May 18 13:02:51 gpfswhbe1s1 mmfs: Error=MMFS_PHOENIX, ID=0xAB429E38, Tag=4997901:
May 18 13:03:36 gpfswhbe1s1 kernel: GPFS Deadman Switch timer [0] has expired; IOs in progress: 0
May 18 13:04:11 gpfswhbe1s1 kernel: Badness in do_exit at kernel/exit.c:807
May 18 13:04:11 gpfswhbe1s1 kernel:
May 18 13:04:11 gpfswhbe1s1 kernel: Call Trace: <ffffffff80133370>{do_exit+80} <ffffffff80133c17>{sys_exit_group+0}
May 18 13:04:11 gpfswhbe1s1 kernel: <ffffffff8010a7be>{system_call+126}
May 18 13:04:11 gpfswhbe1s1 kernel: Badness in do_exit at kernel/exit.c:807
May 18 13:04:11 gpfswhbe1s1 kernel:
May 18 13:04:11 gpfswhbe1s1 kernel: Call Trace: <ffffffff80133370>{do_exit+80} <ffffffff80133c17>{sys_exit_group+0}
May 18 13:04:11 gpfswhbe1s1 kernel: <ffffffff8010a7be>{system_call+126}
May 18 13:18:57 gpfswhbe1s1 sshd[15090]: Accepted publickey for root from 192.168.1.1 port 52281 ssh2
May 18 13:25:12 gpfswhbe1s1 syslog-ng[3705]: STATS: dropped 0
Today we also did some tests with iperf using sdp. The tests worked fine, as long as we didn't use the parrallel option (-P <number>). This option starts multiple client threads to connect to the server. As soon as we started the command, the interface died.
I found it very strange. Didn't anyone get this problem? Is it still a problem in RC3?
Tomorrow we will do more tests to pinpoint the problem even further.
We will also build RPMS for the RC3. Hopefully this helps.
Regards,
Koen
________________________________
Van: Shirley Ma [mailto:xma at us.ibm.com]
Verzonden: ma 21/05/2007 17:41
Aan: SEGERS Koen
CC: general at lists.openfabrics.org; general-bounces at lists.openfabrics.org; Tziporet Koren
Onderwerp: RE: [ofa-general] GPFS node loses IB-connection
Hello,
What's the output of /var/log/messages when you hitting this problem?
Shirley Ma
*** Disclaimer ***
Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel
nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070521/78c59939/attachment.html>
More information about the general
mailing list