[ofa-general] Multicast Performance

Marcel Heinz marcel.heinz at informatik.tu-chemnitz.de
Fri May 23 08:26:41 PDT 2008


Hi,

I have ported an application to use InfiniBand multicast directly via
libibverbs. I have discovered very low multicast throughput, only
~250MByte/s although we are using 4x DDR components. To count out any
effects of the application, I've created a small benchmark (well, it's
only a hack). It just tries to keep the send/recv queue filled with work
request and polls the CQ in an endless loop. In server mode, it joins
to/creates the multicast group as FullMember, attaches the QP to the
group and receives any packets. The client joins as SendOnlyNonMember
and sends Datagrams of full MTU size to the group.

The test setup is as follows:

Host A <---> Switch <---> Host B

We use Mellanox InfiniHost III Lx HCAs (MT25204) and a Flextronics
F-X430046 24-Port Switch, OFED 1.3 and a "vanilla" 2.6.23.9 Linux kernel.

The results are:

Host A		Host B		Throughput (MByte/sec)
client		server		262
client		2xserver	146
client+server	server		944
client+server	---		946

as reference: unicast ib_send_bw (in UD mode): 1146

I don't see any reason why it should become _faster_ when I additionally
start a server on the same host as the client. OTOH, the 944MByte/s
sound relatively sane when compared to the unicast performance with the
additional overhead of having to copy the data locally.

These 260MB/s seem releatively near to the 2GBit/s effective throughput
of a 1x SDR connection. However, the created group is rate 6 (20GBit/s)
and /sys/class/infiniband/mthca0/ports/1/rate file showed 20 Gb/sec
during the whole test.

The error counters of all ports are showing nothing abnormal. Only the
RcvSwRelayErrors counter of the switch's port (to the host running the
client) is increasing very fast, but this seems to be normal for
multicast packets, as the switch is not relaying these packets back to
the source.

We could test on another cluster with 6 nodes (also with MT25204 HCAs, I
don't know the OFED version and switch type) and got the following results:

Host1	Host2	Host3	Host4	Host5	Host6	Throughput (MByte/s)	
1s	 1s 				 1c	 255,15
1s       1s      1s 			 1c 	 255,22
1s	 1s	 1s	 1s		 1c 	 255,22
1s	 1s 	 1s	 1s	 1s	 1c	 255,22

1s1c	 1s	 1s				 738,64
1s1c	 1s 	 1s	 1s			 695,08
1s1c	 1s 	 1s	 1s 	 1s		 565,14
1s1c	 1s 	 1s 	 1s	 1s     1s	 451,90

As long as there is no server and client on the same host, it at least
behaves like multicast. When having both client and server on the same
host, performance decreases as the number of servers increases, which is
totally surprising to me.

Another test I did was doing a ib_send_bw (UD) benchmark while the
multicast benchmark was running between A and B. I got ~260MByte/s for
the multicast and also 260MB/s for ib_send_bw.

Has anyone an idea of what is going on there or a hint what I should check?

Regards,
Marcel



More information about the general mailing list