[openib-general] Mellanox HCAs: outstanding RDMAs

Talpey, Thomas Thomas.Talpey at netapp.com
Mon Jun 5 05:31:11 PDT 2006


At 10:03 AM 6/3/2006, Rimmer, Todd wrote: 
>> Yes, the limit of outstanding RDMAs is not related to the send queue
>> depth.  Of course you can post many more than 4 RDMAs to a send queue
>> -- the HCA just won't have more than 4 requests outstanding at a time.
>
>To further clarity, this parameter only affects the number of concurrent
>outstanding RDMA Reads which the HCA will process.  Once it hits this
>limit, the send Q will stall waiting for issued reads to complete prior
>to initiating new reads.

It's worse than that - the send queue must stall for *all* operations.
Otherwise the hardware has to track in-progress operations which are
queued after stalled ones. It really breaks the initiation model.

Semantically, the provider is not required to provide any such flow control
behavior by the way. The Mellanox one apparently does, but it is not
a requirement of the verbs, it's a requirement on the upper layer. If more
RDMA Reads are posted than the remote peer supports, the connection
may break.

>The number of outstanding RDMA Reads is negotiated by the CM during
>connection establishment and the QP which is sending the RDMA Read must
>have a value configured for this parameter which is <= the remote ends
>capability.

In other words, we're probably stuck at 4. :-) I don't think there is any
Mellanox-based implementation that has ever supported > 4.

>In previous testing by Mellanox on SDR HCAs they indicated values beyond
>2-4 did not improve performance (and in fact required more RDMA
>resources be allocated for the corresponding QP or HCA).  Hence I
>suspect a very large value like 128 would offer no improvement over
>values in the 2-8 range.

I am not so sure of that. For one thing, it's dependent on VERY small
latencies. The presence of a switch, or link extenders will make a huge
difference. Second, heavy multi-QP firmware loads will increase the
latencies. Third, constants are pretty much never a good idea in
networking.

The NFS/RDMA client tries to set the maximum IRD value it can obtain.
RDMA Read is used quite heavily by the server to fetch client data
segments for NFS writes.

Tom.




More information about the general mailing list