[ofw] receive queue depth effect on pingpong latency

Fab Tillier ftillier at microsoft.com
Mon Oct 18 15:58:46 PDT 2010


Hi Tzachi,

Thanks for the quick response!

Tzachi Dar wrote on Mon, 18 Oct 2010 at 15:28:53

> Hi Fab,
> 
> I'm sorry that I don't have time to look at this thoroughly this week
> (maybe someone else will).
> 
> In any case, this looks to me like "End of queue" for IB.
> 
> What this means is that if there are no receive wqes, the card sends
> nacks and stops traffic. (probably not your case)

This shouldn't be the case since I don't send until I receive, and I always repost before sending.

> When there is only one receive packet IB also does not work efficiency,
> and fw flow is being used. (this is probably what you see ).

Can you describe what fw flow is?  Since I won't send the next message until I receive a response, any ACK should get piggy backed on the response, shouldn't it?

> This does not explain everything that you see, but it probably explains
> the first 3 lines.

Why the first 3?  Why not just the first one?  The other cases have more than 1 receive posted...

> By the way, can you post more receive packets and see if this helps?

I post as many items as I have space for in my RQ.  So an RQ of 8 would have 8 receives posted.  I have not looked to see what happens if I post fewer than the limit of the RQ.

Thanks,
-Fab

> Thanks
> Tzachi
> 
>> -----Original Message-----
>> From: Fab Tillier [mailto:ftillier at microsoft.com]
>> Sent: Monday, October 18, 2010 10:07 PM
>> To: Tzachi Dar; Leonid Keller
>> Cc: ofw at lists.openfabrics.org
>> Subject: receive queue depth effect on pingpong latency
>> 
>> Hi Tzachi, Leo,
>> 
>> I've been playing with the ndpingpong test case, and noticed some
>> strange/unexpected behavior:
>> 
>> What I see is a relationship between the RQ depth and the latency where
>> the larger the RQ depth, the lower the latency.  This, despite the
>> program performing a pingpong: a send is only issued once a receive is
>> completed, so there should only be a single work request in transit at
>> a time.
>> 
>> I changed the unit test to use an asymmetric queue depth, keeping the
>> SQ depth at 1, and varying only the RQ depth.
>> 
>> Here are the reported results for the 1 byte message size, by RQ depth:
>> 
>> RQ 1: 7.44us
>> RQ 2: 4.76us
>> RQ 4: 3.20us
>> RQ 6 (default): 2.75us
>> RQ 8: 2.44us
>> RQ 16: 2.04us
>> RQ 32: 1.85us
>> RQ 64: 1.76us
>> RQ 128: 1.71us
>> RQ 256: 1.71us
>> RQ 512: 1.68us
>> RQ 1024: 1.68us
>> RQ 2048: 1.67us
>> RQ 4096: 1.67us
>> 
>> As you can see, things are reaching steady state as the queue depth
>> gets very large.  But as this is a ping pong test, I would have
>> expected performance to be much closer to this for the smaller message
>> sizes.
>> 
>> This is with ConnectX2, QDR, FW 2.07.9110
>> 
>> Any idea why the low RQ depth tests perform so poorly?
>> 
>> Thanks,
>> -Fab
>



More information about the ofw mailing list