[openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: Couldn'tmodify SRQ limit

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Tue May 2 11:58:08 PDT 2006


Amit,
 
I haven't had any problems with this config.
 
Protest Run Results
 
Chassis Branch: TopspinOS-2.6.0
Chassis Build: build171
IB Hosts: svbu-qaclus-1 svbu-qaclus-2
IB Hosts OS: openib_i686_smp_rhel4_34 openib_i686_smp_rhel4_34
Host Branch: OFED-1.0
Host Build: buildrc3
Topology: /data/home/scott/cluster_switch.main/qa/tests/smoke/Lk3.topo
Logs:
/data/home/scott/builds/TopspinOS-2.6.0/build171/protest/Lk3/050206_1152
07
Logs URL:
http://svbu-borg.cisco.com/~releng/data/home/scott/builds/TopspinOS-2.
6.0/build171/protest/Lk3/050206_115207
 

Scenario file: MpiOsuBenchmarks.scen
 
###TEST-D: MPI home dir is
/usr/local/ibed/mpi/gcc/mvapich-0.9.7-mlx2.1.0.
 
###TEST-D: MVAPICH OpenIB osu_latency test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI Latency Test (Version 2.1)
# Size          Latency (us)
0               5.16
1               5.27
2               5.32
4               5.27
8               5.37
16              5.38
32              5.49
64              5.55
128             6.70
256             7.35
512             8.35
1024            10.33
2048            14.51
4096            19.64
8192            59.40
16384           72.00
32768           98.60
65536           152.97
131072          259.36
262144          472.56
524288          895.95
1048576         1748.12
2097152         3476.54
4194304         8703.27
###TEST-D: MVAPICH OpenIB osu_bw test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI Bandwidth Test (Version 2.1)
# Size          Bandwidth (MB/s)
1               0.115860
2               0.234141
4               0.468693
8               0.938176
16              1.801834
32              3.774280
64              7.574665
128             15.227615
256             30.406622
512             62.068835
1024            126.576020
2048            223.283704
4096            358.732809
8192            285.252288
16384           388.095564
32768           474.323906
65536           531.422345
131072          562.801189
262144          579.782217
524288          588.706779
1048576         593.214942
2097152         595.447328
4194304         594.820726
###TEST-D: MVAPICH OpenIB osu_bibw test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI Bidirectional Bandwidth Test (Version 2.1)
# Size          Bi-Bandwidth (MB/s)
1               0.143480
2               0.291655
4               0.579100
8               1.132267
16              2.310026
32              4.616616
64              9.075500
128             19.643435
256             39.267568
512             73.597916
1024            158.550363
2048            290.699402
4096            509.814370
8192            326.919911
16384           459.685017
32768           571.193714
65536           652.835363
131072          700.064301
262144          726.029468
524288          739.861749
1048576         747.233197
2097152         751.168876
4194304         698.340369
###TEST-D: MVAPICH OpenIB osu_bcast test on hosts svbu-qaclus-1
svbu-qaclus-2
# OSU MPI_Bcast Latency Test (Version 1.0)
# Size          Latency (us)
    1      5.899
    2      5.884
    4      5.887
    8      5.882
   16      5.987
   32      5.860
   64      5.917
  128      7.202
  256      7.981
  512      9.371
 1024     11.292
 2048     15.053
 4096     20.031
 8192     63.100
16384     75.669
###TEST-D: MVAPICH OpenIB mpi_multibw test on hosts svbu-qaclus-1
svbu-qaclus-2
# PathScale Modified OSU MPI Bandwidth Test (Version 2.2)
# Running on 1 procs per node
# Size          Aggregate Bandwidth (MB/s)      Messages/s
1               0.116658                        116658.464118
2               0.234265                        117132.451180
4               0.471091                        117772.625225
8               0.933336                        116666.970487
16              1.808325                        113020.290674
32              3.776229                        118007.154184
64              7.559009                        118109.509661
128             15.161947                       118452.711457
256             30.493207                       119114.088963
512             62.340430                       121758.651523
1024            126.154498                      123197.751641
2048            225.834353                      110270.680060
4096            362.919482                      88603.389080
8192            285.539695                      34855.919788
16384           391.398443                      23889.065154
32768           474.238097                      14472.598171
65536           531.765959                      8114.104596
131072          562.829510                      4294.048382
262144          580.052829                      2212.725940
524288          588.813185                      1123.072023
1048576         593.293871                      565.809127
2097152         595.487749                      283.950686
4194304         594.906214                      141.836694
InfiniBand.Performance.MPI.OsuBenchmarks.MVAPICH.OpenIB --> PASS

 
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 


________________________________

	From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Amit Mehrotra
(amehrotr)
	Sent: Tuesday, May 02, 2006 4:36 AM
	To: Openfabrics-ewg at openib.org
	Subject: [openfabrics-ewg] MVAPICH on PCI-X fails with [0]
Abort: Couldn'tmodify SRQ limit
	
	
	Configuration:- RHEL4U3,ia32,rc3,PCI-X
	 
	I have been seeing the following error when I try to run MVAPICH
test programs
	----
	[0] Abort: Couldn't modify SRQ limit
	 at line 999 in file viainit.c
	-----
	 
	On debugging the issue it seems that MVAPICH is being
incorrectly compiled for the PCI_EX cards rather than PCI_X cards. From
the MPI code it seems that PCI-X cards do not support modifications of
SRQs. The source of the problem lies in a bug in the mvapich.make
script(new IBED addition) which incorrectly always returns the card as
PCI_EX . I have appended a diff with the fix. I am not sure as to how
the patch can be correctly generated as the whole MVAPICH  is in the
form of a zipped tarball.
	 
	There seems to be one more issue in the script where it deviates
from the MVAPICH build. The script is treating the older PCI_EX
cards(cards with the lspci signature of 15b3:6278) as PCI-X. Was this
done because these cards also don't support resizing SRQs? 
	 
	-Amit
	 
	-------------------
	diff -u mvapich.make.old mvapich.make
	--- mvapich.make.old    2006-05-02 15:32:11.000000000 +0530
	+++ mvapich.make        2006-05-02 15:33:37.000000000 +0530
	@@ -251,13 +251,13 @@
	                DEF_BUILDID="$DEF_BUILDID"
	        fi
	 fi
	-if (/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
'[:space:]'); then
	+if (test `/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
'[:space:]'` -gt 0); then
	        # Arbel
	        CFLAGS="$CFLAGS -D_PCI_EX_"
	-elif (/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l |
tr -d '[:space:]'); then
	+elif (test `/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc
-l | tr -d '[:space:]'` -gt 0); then
	        # Sinai
	        CFLAGS="$CFLAGS -D_PCI_EX_"
	-elif (/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l |
tr -d '[:space:]'); then
	+elif (test `/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc
-l | tr -d '[:space:]'` -gt 0); then
	        # Tavor
	        CFLAGS="$CFLAGS -D_PCI_X_"
	 fi
	-----------
	 
	 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060502/8a50fbc3/attachment.html>


More information about the ewg mailing list