[ofa-general] Multicast Performance

Mon Jun 23 01:27:33 PDT 2008

On 6/18/08, Marcel Heinz <marcel.heinz at informatik.tu-chemnitz.de> wrote:
> Hi,
>
> Marcel Heinz wrote:
> > [Multicast bandwidth of only ~250MByte/s]
>
> I'm still fighting with this issue. I've played around with my code
> to track down possible causes of the effects, but it looks like I know
> even less than before.
>
> These are the old results (client is always running on host A, the hosts
> running the servers are listed right of the '->'):
>
> A -> [none]     258.867MB/s
> A -> A:         951.288MB/s
> A -> B:         258.863MB/s
> A -> A,B:       952.234MB/s
> A -> B,B:       143.878MB/s
>
> I've tested with the client attaching it's QP to the mcast group
> (but not posting any receive WRs to it). And this already changed
> something:
>
> A -> [none]     303.440MB/s
> A -> A:         99.121MB/s
> A -> B:         303.426MB/s
> A -> A,B:       99.207MB/s
> A -> B,B:       143.866MB/s
>
> The same effect can also be reproduced by starting another instance on
> Host A which just attaches a new QP to the mc group, and then doing nothing
> else, so it is not related to that particular QP I use for sending.
>
> After that, I checked what happens if I post just one WR to the client's
> receive queue (still attached to the mc group) and don't care about it
> any more. I'd expect that for that first WR, the behavior would be the
> same as in the scenario with having another server instance running on
> that host, and after that, it should behave like in the "attached but no
> WRs" scenario above. But this is not the case:
>
> A -> [none]:    383.125MB/s
> A -> A:         104.039MB/s
> A -> B:         383.130MB/s
> A -> A,B:       104.038MB/s
> A -> B,B:       143.920MB/s
>
> I don't know why, but the overall rate stays at 383MB/s instead of going
> back to the 300MB/s from the last test. I've tried to confirm these
> numbers by posting more recv WRs (this time also with polling the recv
> CQ), so that I really could benchmark the 2 different phases. I chose
> 1000000 and got:
>
>                Phase I         Phase II
> A -> [none]:    975.614MB/s     382.953MB/s
> A -> B:         975.614MB/s     382.953MB/s
> A -> A,B:       144.615MB/s     144.615MB/s
> A -> B,B:       143.911MB/s     143.852MB/s
> A -> A:         144.701MB/s     103.944MB/s
>
> At least my expectations for phase I were met. But I have no idea what
> could cause such effects. The fact that performance increases when the
> there is a "local" QP attached to the group indicates the this is a
> problem at the local side, not involving the switch at all, but I can't
> be sure.
>
> Has anyone an explanation for these numbers? Or any ideas what else I
> should check?
>
> Regards,
> Marcel
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>

Hi Marcel,

We also did tests with multicast traffic and figured out that when
using Arbel HCAs
there is performance penalty when:
1. Sending MC packets with NO QP attached on local Arble HCA.
2. Receiving MC packet with more then 1 QP attached on a single HCA
which causes the back pressure that slows down the sender.

Hope this helps
Olga