[ofw] receive queue depth effect on pingpong latency
Fab Tillier
ftillier at microsoft.com
Mon Oct 18 15:58:46 PDT 2010
Hi Tzachi,
Thanks for the quick response!
Tzachi Dar wrote on Mon, 18 Oct 2010 at 15:28:53
> Hi Fab,
>
> I'm sorry that I don't have time to look at this thoroughly this week
> (maybe someone else will).
>
> In any case, this looks to me like "End of queue" for IB.
>
> What this means is that if there are no receive wqes, the card sends
> nacks and stops traffic. (probably not your case)
This shouldn't be the case since I don't send until I receive, and I always repost before sending.
> When there is only one receive packet IB also does not work efficiency,
> and fw flow is being used. (this is probably what you see ).
Can you describe what fw flow is? Since I won't send the next message until I receive a response, any ACK should get piggy backed on the response, shouldn't it?
> This does not explain everything that you see, but it probably explains
> the first 3 lines.
Why the first 3? Why not just the first one? The other cases have more than 1 receive posted...
> By the way, can you post more receive packets and see if this helps?
I post as many items as I have space for in my RQ. So an RQ of 8 would have 8 receives posted. I have not looked to see what happens if I post fewer than the limit of the RQ.
Thanks,
-Fab
> Thanks
> Tzachi
>
>> -----Original Message-----
>> From: Fab Tillier [mailto:ftillier at microsoft.com]
>> Sent: Monday, October 18, 2010 10:07 PM
>> To: Tzachi Dar; Leonid Keller
>> Cc: ofw at lists.openfabrics.org
>> Subject: receive queue depth effect on pingpong latency
>>
>> Hi Tzachi, Leo,
>>
>> I've been playing with the ndpingpong test case, and noticed some
>> strange/unexpected behavior:
>>
>> What I see is a relationship between the RQ depth and the latency where
>> the larger the RQ depth, the lower the latency. This, despite the
>> program performing a pingpong: a send is only issued once a receive is
>> completed, so there should only be a single work request in transit at
>> a time.
>>
>> I changed the unit test to use an asymmetric queue depth, keeping the
>> SQ depth at 1, and varying only the RQ depth.
>>
>> Here are the reported results for the 1 byte message size, by RQ depth:
>>
>> RQ 1: 7.44us
>> RQ 2: 4.76us
>> RQ 4: 3.20us
>> RQ 6 (default): 2.75us
>> RQ 8: 2.44us
>> RQ 16: 2.04us
>> RQ 32: 1.85us
>> RQ 64: 1.76us
>> RQ 128: 1.71us
>> RQ 256: 1.71us
>> RQ 512: 1.68us
>> RQ 1024: 1.68us
>> RQ 2048: 1.67us
>> RQ 4096: 1.67us
>>
>> As you can see, things are reaching steady state as the queue depth
>> gets very large. But as this is a ping pong test, I would have
>> expected performance to be much closer to this for the smaller message
>> sizes.
>>
>> This is with ConnectX2, QDR, FW 2.07.9110
>>
>> Any idea why the low RQ depth tests perform so poorly?
>>
>> Thanks,
>> -Fab
>
More information about the ofw
mailing list