[ofa-general] missing "balance" in aggregate bi-directional SDP bulk transfer
Rick Jones
rick.jones2 at hp.com
Thu Jul 12 14:42:44 PDT 2007
I've been trudging through a set of netperf tests with OFED 1.2, and
came to a point where I was running concurrent netperf bidirectional
tests through both ports of:
03:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
(Tavor compatibility mode) (rev 20)
I configured ib0 and ib1 into separate IP subnets, and ran the
"bidirectional TCP_RR" test (./configure --enable-bursts, large socket
buffer, large req/rsp size and a burst of 12 transactions in flight at
one time) and the results were rather even - each connection achieved
about the same performance.
However, when I run the same test over SDP, some connections seem to get
much better performance than others. For example, with two concurrent
connections, one over each port, one will get a much higher result than
the other.
Four iterations of a pair of SDP_RR tests, one each across the two ports
of the HCA (ie run two concurrent netperfs, four times in a row), what
this calls port "1" is running over ib0, what it calls "3" is running
over ib1 (1 and 3 were the subnet numbers and were simply convenient
tags), the units are transactions per second, process completion
notification messages trimmed for readability:
[root at hpcpc106 netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ;
do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR --
-s 1M -S 1M -r 64K -b 12 & done;wait;done
2294.65 port 1
10003.66 port 3
398.63 port 1
11898.55 port 3
269.73 port 3
12025.79 port 1
478.29 port 3
11819.61 port 1
It doesn't seem that the favoritism is pegged to a specific port since
they traded places there in the middle.
Now, if I reload the ib_sdp module, and set recv_poll and send_poll to 0
I get this behaviour:
[root at hpcpc106 netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ;
do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR --
-s 1M -S 1M -r 64K -b 12 & done;wait;done
6132.89 port 1
6132.79 port 3
6127.32 port 1
6127.27 port 3
6006.84 port 1
6006.34 port 3
6134.83 port 1
6134.29 port 3
I guess it is possible for one of the netperfs or netservers to spin
such that they preclude the other from running, even though I have four
cores on the system. For additional grins I pinned each
netperf/netserver to its own CPU, with the send_poll and recv_poll put
back to defaults (unloaded and reloaded the ib_sdp module)
[root at hpcpc106 netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ;
do netperf -T $i -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t
SDP_RR -- -s 1M -S 1M -r 64K -b 12 & done;wait;done
10108.65 port 1
2187.80 port 3
7754.14 port 3
4541.81 port 1
7013.78 port 3
5282.01 port 1
6499.44 port 3
5796.42 port 1
And I still see this apparant starvation of one of the connections,
although it isn't (overall) as bad as without the binding so I guess it
isn't anything one can workaround via CPU binding trickery. Is this
behaviour expected?
rick jones
More information about the general
mailing list