[ofa-general] missing "balance" in aggregate bi-directional SDP bulk transfer

Thu Jul 12 14:42:44 PDT 2007

I've been trudging through a set of netperf tests with OFED 1.2, and 
came to a point where I was running concurrent netperf bidirectional 
tests through both ports of:

03:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex 
(Tavor compatibility mode) (rev 20)

I configured ib0 and ib1 into separate IP subnets, and ran the 
"bidirectional TCP_RR" test (./configure --enable-bursts, large socket 
buffer, large req/rsp size and a burst of 12 transactions in flight at 
one time) and the results were rather even - each connection achieved 
about the same performance.

However, when I run the same test over SDP, some connections seem to get 
much better performance than others.  For example, with two concurrent 
connections, one over each port, one will get a much higher result than 
the other.

Four iterations of a pair of SDP_RR tests, one each across the two ports 
of the HCA (ie run two concurrent netperfs, four times in a row), what 
this calls port "1" is running over ib0, what it calls "3" is running 
over ib1 (1 and 3 were the subnet numbers and were simply convenient 
tags), the units are transactions per second, process completion 
notification messages trimmed for readability:

[root at hpcpc106 netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ; 
do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR -- 
-s 1M -S 1M -r 64K -b 12 & done;wait;done

2294.65 port 1
10003.66 port 3

  398.63 port 1
11898.55 port 3

  269.73 port 3
12025.79 port 1

  478.29 port 3
11819.61 port 1

It doesn't seem that the favoritism is pegged to a specific port since 
they traded places there in the middle.

Now, if I reload the ib_sdp module, and set recv_poll and send_poll to 0 
I get this behaviour:

[root at hpcpc106 netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ; 
do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR -- 
-s 1M -S 1M -r 64K -b 12 & done;wait;done

6132.89 port 1
6132.79 port 3

6127.32 port 1
6127.27 port 3

6006.84 port 1
6006.34 port 3

6134.83 port 1
6134.29 port 3

I guess it is possible for one of the netperfs or netservers to spin 
such that they preclude the other from running, even though I have four 
cores on the system.  For additional grins I pinned each 
netperf/netserver to its own CPU, with the send_poll and recv_poll put 
back to defaults (unloaded and reloaded the ib_sdp module)

[root at hpcpc106 netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ; 
do netperf -T $i -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t 
SDP_RR -- -s 1M -S 1M -r 64K -b 12 & done;wait;done

10108.65 port 1
2187.80 port 3

7754.14 port 3
4541.81 port 1

7013.78 port 3
5282.01 port 1

6499.44 port 3
5796.42 port 1

And I still see this apparant starvation of one of the connections, 
although it isn't (overall) as bad as without the binding so I guess it 
isn't anything one can workaround via CPU binding trickery.  Is this 
behaviour expected?

rick jones