[openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: Couldn'tmodify SRQ limit
Amit Mehrotra (amehrotr)
amehrotr at cisco.com
Fri May 5 09:42:43 PDT 2006
Hi Amit,
A manual firmware upgrade fixed the problem. I was incorrectly expecting
that the ibed installer would have automatically flashed the correct
firmware.
The code is now still being compiled with PCI_EX but the only impact is
that the vbuf size(sort of packet size) calculation in mvapich becomes
incorrect(mvapich calculates vbuf size based on the weather its a
PCI_X/PCI_EX cards).
I believe we would need to get MVAPICH changed so that when it compiles
for PCI_X it compiles it with SRQ support and the mvapich.make script
needs to be fixed so that it correctly identifies the IB card as PCI_X.
-Amit
________________________________
From: Amit Krig [mailto:amitk at mellanox.co.il]
Sent: Thursday, May 04, 2006 5:48 PM
To: Amit Mehrotra (amehrotr); Openfabrics-ewg at openib.org
Subject: RE: [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
Couldn'tmodify SRQ limit
Hi Amit,
Please note that SRQ limit event is being supported from fw version
3.4.0
Can you specify the FW version that you use, SRQ size, number of
outstanding WR in the SRQ when the modify SRQ command was executed,
requested SRQ limit value and QP transport type.
Amit
________________________________
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Amit Mehrotra
(amehrotr)
Sent: Tuesday, May 02, 2006 2:36 PM
To: Openfabrics-ewg at openib.org
Subject: [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
Couldn'tmodify SRQ limit
Configuration:- RHEL4U3,ia32,rc3,PCI-X
I have been seeing the following error when I try to run MVAPICH test
programs
----
[0] Abort: Couldn't modify SRQ limit
at line 999 in file viainit.c
-----
On debugging the issue it seems that MVAPICH is being incorrectly
compiled for the PCI_EX cards rather than PCI_X cards. From the MPI code
it seems that PCI-X cards do not support modifications of SRQs. The
source of the problem lies in a bug in the mvapich.make script(new IBED
addition) which incorrectly always returns the card as PCI_EX . I have
appended a diff with the fix. I am not sure as to how the patch can be
correctly generated as the whole MVAPICH is in the form of a zipped
tarball.
There seems to be one more issue in the script where it deviates from
the MVAPICH build. The script is treating the older PCI_EX cards(cards
with the lspci signature of 15b3:6278) as PCI-X. Was this done because
these cards also don't support resizing SRQs?
-Amit
-------------------
diff -u mvapich.make.old mvapich.make
--- mvapich.make.old 2006-05-02 15:32:11.000000000 +0530
+++ mvapich.make 2006-05-02 15:33:37.000000000 +0530
@@ -251,13 +251,13 @@
DEF_BUILDID="$DEF_BUILDID"
fi
fi
-if (/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d '[:space:]');
then
+if (test `/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
'[:space:]'` -gt 0); then
# Arbel
CFLAGS="$CFLAGS -D_PCI_EX_"
-elif (/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l | tr -d
'[:space:]'); then
+elif (test `/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l | tr
-d '[:space:]'` -gt 0); then
# Sinai
CFLAGS="$CFLAGS -D_PCI_EX_"
-elif (/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l | tr -d
'[:space:]'); then
+elif (test `/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l | tr
-d '[:space:]'` -gt 0); then
# Tavor
CFLAGS="$CFLAGS -D_PCI_X_"
fi
-----------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060505/604c9530/attachment.html>
More information about the ewg
mailing list