[openib-general] Re: Mellanox HCAs: outstanding RDMAs

Michael Krause krause at cup.hp.com
Fri Jun 9 06:59:30 PDT 2006


Whether iWARP or IB, there is a fixed number of RDMA Requests allowed to be 
outstanding at any given time.  If one posts more RDMA Read requests than 
the fixed number, the transmit queue is stalled.  This is documented in 
both technology specifications.  It is something that all ULP should be 
aware of and some go so far as to communicate that as part of the Hello / 
login exchange.  This allows the ULP implementation to determine whether it 
wants to stall or wants to wait until Read Responses complete before 
sending another request.  This isn't something silent; this isn't something 
new; this is something for the ULP implementation to decide how to deal 
with the issue.

BTW, this is part of the hardware and associated specifications so it is up 
to software to deal with the limited hardware resources and the associated 
consequences.  Please keep in mind that there are a limited number of RDMA 
Request / Atomic resource "slots" at the receiving HCA / RNIC.  These are 
kept in hardware thus one must know the exact limit to avoid creating 
protocol problems.  A ULP transmitter may post to the transmit queue more 
than the allotted slots but the transmitting (source) HCA / RNIC must not 
issue them to the remote.  These requests do cause the source to 
stall.  This is a well understood problem and if people give the iSCSI / 
iSER and DA specs good read or SDP they can see that this issue is 
comprehended.  I agree with people that ULP designers / implementers must 
pay close attention to this constraint as it is in the iWARP / IB 
specifications for a very good reason and these semantics must be preserved 
to maintain the ordering requirements that are the used by the overall RDMA 
protocols themselves.

Mike



At 05:24 AM 6/6/2006, Talpey, Thomas wrote:
>At 03:43 AM 6/6/2006, Michael S. Tsirkin wrote:
> >Quoting r. Talpey, Thomas <Thomas.Talpey at netapp.com>:
> >> Semantically, the provider is not required to provide any such flow 
> control
> >> behavior by the way. The Mellanox one apparently does, but it is not
> >> a requirement of the verbs, it's a requirement on the upper layer. If more
> >> RDMA Reads are posted than the remote peer supports, the connection
> >> may break.
> >
> >This does not sound right. Isn't this the meaning of this field:
> >"Initiator Depth: Number of RDMA Reads & atomic operations
> >outstanding at any time"? Shouldn't any provider enforce this limit?
>
>The core spec does not require it. An implementation *may* enforce it,
>but is not *required* to do so. And as pointed out in the other message,
>there are repercussions of doing so.
>
>I believe the silent queue stalling is a bit of a time bomb for upper layers,
>whose implementers are quite likely unaware of the danger. I greatly
>prefer an implementation which simply sends the RDMA Read request,
>resulting in a failed (but unblocked!) connection. Silence is a very
>dangerous thing, no matter how helpful the intent.
>
>Tom.
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060609/44557101/attachment.html>


More information about the general mailing list