[openib-general] Mellanox HCAs: outstanding RDMAs

Mon Jun 5 10:50:55 PDT 2006

Talpey, Thomas wrote:

>At 10:03 AM 6/3/2006, Rimmer, Todd wrote: 
>  
>
>>>Yes, the limit of outstanding RDMAs is not related to the send queue
>>>depth.  Of course you can post many more than 4 RDMAs to a send queue
>>>-- the HCA just won't have more than 4 requests outstanding at a time.
>>>      
>>>
>>To further clarity, this parameter only affects the number of concurrent
>>outstanding RDMA Reads which the HCA will process.  Once it hits this
>>limit, the send Q will stall waiting for issued reads to complete prior
>>to initiating new reads.
>>    
>>
>
>It's worse than that - the send queue must stall for *all* operations.
>Otherwise the hardware has to track in-progress operations which are
>queued after stalled ones. It really breaks the initiation model.
>  
>

possibility of stalling is scary!
is there any way one can figure out:

1. number of outstanding sends at a given point of time in Send Q?
2. maximum number of outstanding sends ever posted (during the lifetime 
of Q)?

its possible to measure those in ULPs, but then that may not match 
exactly what is
seen in the real Q...so, is there any low level tool to measure this?

thanks, som.

>Semantically, the provider is not required to provide any such flow control
>behavior by the way. The Mellanox one apparently does, but it is not
>a requirement of the verbs, it's a requirement on the upper layer. If more
>RDMA Reads are posted than the remote peer supports, the connection
>may break.
>
>  
>
>>The number of outstanding RDMA Reads is negotiated by the CM during
>>connection establishment and the QP which is sending the RDMA Read must
>>have a value configured for this parameter which is <= the remote ends
>>capability.
>>    
>>
>
>In other words, we're probably stuck at 4. :-) I don't think there is any
>Mellanox-based implementation that has ever supported > 4.
>
>  
>
>>In previous testing by Mellanox on SDR HCAs they indicated values beyond
>>2-4 did not improve performance (and in fact required more RDMA
>>resources be allocated for the corresponding QP or HCA).  Hence I
>>suspect a very large value like 128 would offer no improvement over
>>values in the 2-8 range.
>>    
>>
>
>I am not so sure of that. For one thing, it's dependent on VERY small
>latencies. The presence of a switch, or link extenders will make a huge
>difference. Second, heavy multi-QP firmware loads will increase the
>latencies. Third, constants are pretty much never a good idea in
>networking.
>
>The NFS/RDMA client tries to set the maximum IRD value it can obtain.
>RDMA Read is used quite heavily by the server to fetch client data
>segments for NFS writes.
>
>Tom.
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>  
>