[openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: Couldn'tmodify SRQ limit

Sayantan Sur surs at cse.ohio-state.edu
Fri May 5 14:33:52 PDT 2006


> I believe we would need to get MVAPICH changed so that when it compiles for
> PCI_X it compiles it with SRQ support and the mvapich.make script needs to be
> fixed so that it correctly identifies the IB card as PCI_X.

I upgraded the firmware on the PCI-X cards (MT23108) to the 3.4 version.
With this upgrade, MVAPICH can run using the SRQ limit event. However,
on modifying a Gen2 level performance test (perftest/send_lat.c) to use
SRQ (instead of send/receive) I saw degraded performance of SRQ on PCI-X
cards. Here is the data I collected:

# Msg     SRQ     SR
2         10.65   6.13
4         10.68   6.11
8         10.69   6.13
16        10.76   6.19
32        10.72   6.22
64        10.83   6.36
128       11.01   6.55
256       11.50   7.06
512       12.48   7.96
1024      13.76   9.27

I am not sure whether running MVAPICH on PCI-X based clusters with such
degraded small message performance will be optimal. The current
configuration for PCI-X clusters uses the adaptive RDMA path which is
quite scalable in the first place.



More information about the ewg mailing list