[openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: Couldn'tmodify SRQ limit
Scott Weitzenkamp (sweitzen)
sweitzen at cisco.com
Tue May 2 11:58:08 PDT 2006
Amit,
I haven't had any problems with this config.
Protest Run Results
Chassis Branch: TopspinOS-2.6.0
Chassis Build: build171
IB Hosts: svbu-qaclus-1 svbu-qaclus-2
IB Hosts OS: openib_i686_smp_rhel4_34 openib_i686_smp_rhel4_34
Host Branch: OFED-1.0
Host Build: buildrc3
Topology: /data/home/scott/cluster_switch.main/qa/tests/smoke/Lk3.topo
Logs:
/data/home/scott/builds/TopspinOS-2.6.0/build171/protest/Lk3/050206_1152
07
Logs URL:
http://svbu-borg.cisco.com/~releng/data/home/scott/builds/TopspinOS-2.
6.0/build171/protest/Lk3/050206_115207
Scenario file: MpiOsuBenchmarks.scen
###TEST-D: MPI home dir is
/usr/local/ibed/mpi/gcc/mvapich-0.9.7-mlx2.1.0.
###TEST-D: MVAPICH OpenIB osu_latency test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI Latency Test (Version 2.1)
# Size Latency (us)
0 5.16
1 5.27
2 5.32
4 5.27
8 5.37
16 5.38
32 5.49
64 5.55
128 6.70
256 7.35
512 8.35
1024 10.33
2048 14.51
4096 19.64
8192 59.40
16384 72.00
32768 98.60
65536 152.97
131072 259.36
262144 472.56
524288 895.95
1048576 1748.12
2097152 3476.54
4194304 8703.27
###TEST-D: MVAPICH OpenIB osu_bw test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI Bandwidth Test (Version 2.1)
# Size Bandwidth (MB/s)
1 0.115860
2 0.234141
4 0.468693
8 0.938176
16 1.801834
32 3.774280
64 7.574665
128 15.227615
256 30.406622
512 62.068835
1024 126.576020
2048 223.283704
4096 358.732809
8192 285.252288
16384 388.095564
32768 474.323906
65536 531.422345
131072 562.801189
262144 579.782217
524288 588.706779
1048576 593.214942
2097152 595.447328
4194304 594.820726
###TEST-D: MVAPICH OpenIB osu_bibw test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI Bidirectional Bandwidth Test (Version 2.1)
# Size Bi-Bandwidth (MB/s)
1 0.143480
2 0.291655
4 0.579100
8 1.132267
16 2.310026
32 4.616616
64 9.075500
128 19.643435
256 39.267568
512 73.597916
1024 158.550363
2048 290.699402
4096 509.814370
8192 326.919911
16384 459.685017
32768 571.193714
65536 652.835363
131072 700.064301
262144 726.029468
524288 739.861749
1048576 747.233197
2097152 751.168876
4194304 698.340369
###TEST-D: MVAPICH OpenIB osu_bcast test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI_Bcast Latency Test (Version 1.0)
# Size Latency (us)
1 5.899
2 5.884
4 5.887
8 5.882
16 5.987
32 5.860
64 5.917
128 7.202
256 7.981
512 9.371
1024 11.292
2048 15.053
4096 20.031
8192 63.100
16384 75.669
###TEST-D: MVAPICH OpenIB mpi_multibw test on hosts svbu-qaclus-1
svbu-qaclus-2
# PathScale Modified OSU MPI Bandwidth Test (Version 2.2)
# Running on 1 procs per node
# Size Aggregate Bandwidth (MB/s) Messages/s
1 0.116658 116658.464118
2 0.234265 117132.451180
4 0.471091 117772.625225
8 0.933336 116666.970487
16 1.808325 113020.290674
32 3.776229 118007.154184
64 7.559009 118109.509661
128 15.161947 118452.711457
256 30.493207 119114.088963
512 62.340430 121758.651523
1024 126.154498 123197.751641
2048 225.834353 110270.680060
4096 362.919482 88603.389080
8192 285.539695 34855.919788
16384 391.398443 23889.065154
32768 474.238097 14472.598171
65536 531.765959 8114.104596
131072 562.829510 4294.048382
262144 580.052829 2212.725940
524288 588.813185 1123.072023
1048576 593.293871 565.809127
2097152 595.487749 283.950686
4194304 594.906214 141.836694
InfiniBand.Performance.MPI.OsuBenchmarks.MVAPICH.OpenIB --> PASS
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
________________________________
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Amit Mehrotra
(amehrotr)
Sent: Tuesday, May 02, 2006 4:36 AM
To: Openfabrics-ewg at openib.org
Subject: [openfabrics-ewg] MVAPICH on PCI-X fails with [0]
Abort: Couldn'tmodify SRQ limit
Configuration:- RHEL4U3,ia32,rc3,PCI-X
I have been seeing the following error when I try to run MVAPICH
test programs
----
[0] Abort: Couldn't modify SRQ limit
at line 999 in file viainit.c
-----
On debugging the issue it seems that MVAPICH is being
incorrectly compiled for the PCI_EX cards rather than PCI_X cards. From
the MPI code it seems that PCI-X cards do not support modifications of
SRQs. The source of the problem lies in a bug in the mvapich.make
script(new IBED addition) which incorrectly always returns the card as
PCI_EX . I have appended a diff with the fix. I am not sure as to how
the patch can be correctly generated as the whole MVAPICH is in the
form of a zipped tarball.
There seems to be one more issue in the script where it deviates
from the MVAPICH build. The script is treating the older PCI_EX
cards(cards with the lspci signature of 15b3:6278) as PCI-X. Was this
done because these cards also don't support resizing SRQs?
-Amit
-------------------
diff -u mvapich.make.old mvapich.make
--- mvapich.make.old 2006-05-02 15:32:11.000000000 +0530
+++ mvapich.make 2006-05-02 15:33:37.000000000 +0530
@@ -251,13 +251,13 @@
DEF_BUILDID="$DEF_BUILDID"
fi
fi
-if (/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
'[:space:]'); then
+if (test `/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
'[:space:]'` -gt 0); then
# Arbel
CFLAGS="$CFLAGS -D_PCI_EX_"
-elif (/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l |
tr -d '[:space:]'); then
+elif (test `/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc
-l | tr -d '[:space:]'` -gt 0); then
# Sinai
CFLAGS="$CFLAGS -D_PCI_EX_"
-elif (/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l |
tr -d '[:space:]'); then
+elif (test `/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc
-l | tr -d '[:space:]'` -gt 0); then
# Tavor
CFLAGS="$CFLAGS -D_PCI_X_"
fi
-----------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060502/8a50fbc3/attachment.html>
More information about the ewg
mailing list