[ofa-general] [Bug 465] IPoIB CM HA fails after several hours of failures
Michael S. Tsirkin
mst at dev.mellanox.co.il
Tue Mar 27 01:59:00 PDT 2007
Pls do not reply to this message.
I am copying the general list on this bug report so that
we can start discussion by mail.
I am then going to reply copying the bugzilla reflector
so that "reply all" will get tracked in bugzilla.
Subject: [Bug 465] New: IPoIB CM HA fails after several hours of failures
Date: Sun, 18 Mar 2007 08:45:48 +0200
From: bugzilla-daemon at lists.openfabrics.org
https://bugs.openfabrics.org/show_bug.cgi?id=465
Summary: IPoIB CM HA fails after several hours of failures
Product: OpenFabrics Linux
Version: 1.2beta1
Platform: X86-64
OS/Version: All
Status: NEW
Severity: critical
Priority: P2
Component: IPoIB
AssignedTo: mst at mellanox.co.il
ReportedBy: sweitzen at cisco.com
CC: tziporet at mellanox.co.il
I've been trying IPoIB CM HA for a few weeks, and can't get it to run
overnight. I've tried both SLES10 (LionCub DDR) and RHEL4 (LionMini SDR and
LionMini DDR).
I run netperf 2.4.1 with large socket buffers:
netperf241 -H 192.168.2.46 -D -l 36000 -- -s 349520 -S 349520 -m 65536
While netperf is running, I start flipping IB ports once every 10 seconds.
After a few hours, I sometimes see netperf throughput drop to almost zero:
Interim result: 1911.72 10^6bits/s over 2.52 seconds
Interim result: 4823.63 10^6bits/s over 1.00 seconds
Interim result: 4816.90 10^6bits/s over 1.00 seconds
Interim result: 4820.21 10^6bits/s over 1.00 seconds
Interim result: 4816.85 10^6bits/s over 1.00 seconds
Interim result: 4818.13 10^6bits/s over 1.00 seconds
Interim result: 324.99 10^6bits/s over 14.83 seconds
Interim result: 4811.39 10^6bits/s over 1.00 seconds
Interim result: 4817.64 10^6bits/s over 1.00 seconds
Interim result: 4812.06 10^6bits/s over 1.00 seconds
Interim result: 4809.26 10^6bits/s over 1.00 seconds
Interim result: 4817.21 10^6bits/s over 1.00 seconds
Interim result: 85.80 10^6bits/s over 56.14 seconds
Interim result: 1910.76 10^6bits/s over 2.52 seconds
Interim result: 4813.64 10^6bits/s over 1.00 seconds
Interim result: 4813.03 10^6bits/s over 1.00 seconds
Interim result: 4807.23 10^6bits/s over 1.00 seconds
Interim result: 4810.83 10^6bits/s over 1.00 seconds
Interim result: 4813.61 10^6bits/s over 1.00 seconds
Interim result: 272.39 10^6bits/s over 17.67 seconds
Interim result: 4816.57 10^6bits/s over 1.00 seconds
Interim result: 4810.02 10^6bits/s over 1.00 seconds
Interim result: 4809.88 10^6bits/s over 1.00 seconds
Interim result: 17.63 10^6bits/s over 278.01 seconds
Interim result: 0.21 10^6bits/s over 30.58 seconds
Interim result: 0.33 10^6bits/s over 14.20 seconds
Interim result: 0.45 10^6bits/s over 13.90 seconds
Interim result: 0.11 10^6bits/s over 56.20 seconds
Interim result: 0.34 10^6bits/s over 13.95 seconds
Interim result: 0.89 10^6bits/s over 14.21 seconds
Interim result: 0.11 10^6bits/s over 55.17 seconds
Interim result: 0.08 10^6bits/s over 56.20 seconds
Interim result: 0.20 10^6bits/s over 32.14 seconds
Interim result: 1.00 10^6bits/s over 6.30 seconds
Interim result: 0.37 10^6bits/s over 17.03 seconds
Interim result: 1.74 10^6bits/s over 7.25 seconds
Interim result: 0.02 10^6bits/s over 345.16 seconds
Interim result: 0.10 10^6bits/s over 112.83 seconds
Interim result: 0.45 10^6bits/s over 13.91 seconds
Interim result: 0.68 10^6bits/s over 6.91 seconds
Interim result: 0.06 10^6bits/s over 112.48 seconds
Interim result: 0.10 10^6bits/s over 60.32 seconds
Interim result: 0.43 10^6bits/s over 14.55 seconds
Other times netperf hangs or fails.
Restarting netperf as is never works. Sometimes I can restart netperf with
default socket buffer sizes.
----- End forwarded message -----
--
MST
More information about the general
mailing list