[openib-general] Re: Mellanox HCAs: outstanding RDMAs
Michael Krause
krause at cup.hp.com
Fri Jun 9 06:59:30 PDT 2006
Whether iWARP or IB, there is a fixed number of RDMA Requests allowed to be
outstanding at any given time. If one posts more RDMA Read requests than
the fixed number, the transmit queue is stalled. This is documented in
both technology specifications. It is something that all ULP should be
aware of and some go so far as to communicate that as part of the Hello /
login exchange. This allows the ULP implementation to determine whether it
wants to stall or wants to wait until Read Responses complete before
sending another request. This isn't something silent; this isn't something
new; this is something for the ULP implementation to decide how to deal
with the issue.
BTW, this is part of the hardware and associated specifications so it is up
to software to deal with the limited hardware resources and the associated
consequences. Please keep in mind that there are a limited number of RDMA
Request / Atomic resource "slots" at the receiving HCA / RNIC. These are
kept in hardware thus one must know the exact limit to avoid creating
protocol problems. A ULP transmitter may post to the transmit queue more
than the allotted slots but the transmitting (source) HCA / RNIC must not
issue them to the remote. These requests do cause the source to
stall. This is a well understood problem and if people give the iSCSI /
iSER and DA specs good read or SDP they can see that this issue is
comprehended. I agree with people that ULP designers / implementers must
pay close attention to this constraint as it is in the iWARP / IB
specifications for a very good reason and these semantics must be preserved
to maintain the ordering requirements that are the used by the overall RDMA
protocols themselves.
Mike
At 05:24 AM 6/6/2006, Talpey, Thomas wrote:
>At 03:43 AM 6/6/2006, Michael S. Tsirkin wrote:
> >Quoting r. Talpey, Thomas <Thomas.Talpey at netapp.com>:
> >> Semantically, the provider is not required to provide any such flow
> control
> >> behavior by the way. The Mellanox one apparently does, but it is not
> >> a requirement of the verbs, it's a requirement on the upper layer. If more
> >> RDMA Reads are posted than the remote peer supports, the connection
> >> may break.
> >
> >This does not sound right. Isn't this the meaning of this field:
> >"Initiator Depth: Number of RDMA Reads & atomic operations
> >outstanding at any time"? Shouldn't any provider enforce this limit?
>
>The core spec does not require it. An implementation *may* enforce it,
>but is not *required* to do so. And as pointed out in the other message,
>there are repercussions of doing so.
>
>I believe the silent queue stalling is a bit of a time bomb for upper layers,
>whose implementers are quite likely unaware of the danger. I greatly
>prefer an implementation which simply sends the RDMA Read request,
>resulting in a failed (but unblocked!) connection. Silence is a very
>dangerous thing, no matter how helpful the intent.
>
>Tom.
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit
>http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060609/44557101/attachment.html>
More information about the general
mailing list