[ofa-general] Multicast Performance
Marcel Heinz
marcel.heinz at informatik.tu-chemnitz.de
Wed Jun 18 09:20:37 PDT 2008
Hi,
Marcel Heinz wrote:
> [Multicast bandwidth of only ~250MByte/s]
I'm still fighting with this issue. I've played around with my code
to track down possible causes of the effects, but it looks like I know
even less than before.
These are the old results (client is always running on host A, the hosts
running the servers are listed right of the '->'):
A -> [none] 258.867MB/s
A -> A: 951.288MB/s
A -> B: 258.863MB/s
A -> A,B: 952.234MB/s
A -> B,B: 143.878MB/s
I've tested with the client attaching it's QP to the mcast group
(but not posting any receive WRs to it). And this already changed
something:
A -> [none] 303.440MB/s
A -> A: 99.121MB/s
A -> B: 303.426MB/s
A -> A,B: 99.207MB/s
A -> B,B: 143.866MB/s
The same effect can also be reproduced by starting another instance on
Host A which just attaches a new QP to the mc group, and then doing nothing
else, so it is not related to that particular QP I use for sending.
After that, I checked what happens if I post just one WR to the client's
receive queue (still attached to the mc group) and don't care about it
any more. I'd expect that for that first WR, the behavior would be the
same as in the scenario with having another server instance running on
that host, and after that, it should behave like in the "attached but no
WRs" scenario above. But this is not the case:
A -> [none]: 383.125MB/s
A -> A: 104.039MB/s
A -> B: 383.130MB/s
A -> A,B: 104.038MB/s
A -> B,B: 143.920MB/s
I don't know why, but the overall rate stays at 383MB/s instead of going
back to the 300MB/s from the last test. I've tried to confirm these
numbers by posting more recv WRs (this time also with polling the recv
CQ), so that I really could benchmark the 2 different phases. I chose
1000000 and got:
Phase I Phase II
A -> [none]: 975.614MB/s 382.953MB/s
A -> B: 975.614MB/s 382.953MB/s
A -> A,B: 144.615MB/s 144.615MB/s
A -> B,B: 143.911MB/s 143.852MB/s
A -> A: 144.701MB/s 103.944MB/s
At least my expectations for phase I were met. But I have no idea what
could cause such effects. The fact that performance increases when the
there is a "local" QP attached to the group indicates the this is a
problem at the local side, not involving the switch at all, but I can't
be sure.
Has anyone an explanation for these numbers? Or any ideas what else I
should check?
Regards,
Marcel
More information about the general
mailing list