[openib-general] Immediate data question
Michael Krause
krause at cup.hp.com
Thu Feb 15 09:42:37 PST 2007
At 09:37 PM 2/14/2007, Devesh Sharma wrote:
>On 2/14/07, Michael Krause <krause at cup.hp.com> wrote:
>>At 05:37 AM 2/13/2007, Devesh Sharma wrote:
>> >On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
>> >>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>> >> > > >
>> >> > > >Not for the receiver, but the sender will be severely slowed down by
>> >> > > >having to wait for the RNR timeouts.
>> >> > >
>> >> > > RNR = Receiver Not Ready so by definition, the data flow
>> >> > > isn't going to
>> >> > > progress until the receiver is ready to receive data. If a
>> >> > > receive QP
>> >> > > enters RNR for a RC, then it is likely not progressing as
>> >> > > desired. RNR
>> >> > > was initially put in place to enable a receiver to create
>> >> > > back pressure to the sender without causing a fatal error
>> >> > > condition. It should rarely be entered and therefore should
>> >> > > have negligible impact on overall performance however when a
>> >> > > RNR occurs, no forward progress will occur so performance is
>> >> > > essentially zero.
>> >> >
>> >> > Mike:
>> >> > I still do not quite understand this issue. I have two
>> >> > situations that have RNR triggered.
>> >> >
>> >> > 1. process A and process B is connected with QP. A first post a send to
>> >> > B, B does not post receive. Then A and B are doing a long time
>> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
>> >> > message. Finally B will post a receive. Does the first pending send
>> in A
>> >> > block all the later RDMA_WRITE ?
>> >>According to IBTA spec HCA will process WR entries in strict order in
>> >>which they are posted so the send will block all WR posted after this
>> >>send, Until-unless HCA has multiple processing elements, I think even
>> >>then processing order will be maintained by HCA
>> >> If not, since RNR is triggered
>> >> > periodically till B post receive, does it affect the RDMA_WRITE
>> >> > performance between A and B ?
>> >> >
>> >> > 2. extend above to three processes, A connect to B, B connect to C,
>> so B
>> >> > has two QPs, but one CQ.A posts a send to B, B does not post receive,
>> >post ordering accross QP is not guaranteed hence presence of same CQ
>> >or different CQ will not affect any thing.
>> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
>> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
>I am sorry I have missed that in both cases same DMA channel is in use.
>> >_may_ affect the performance, since load is on same HCA. In case of
>> >Send/Recv again _may_ affect the performance, with the same reason.
>>
>>Seems orthogonal. Any time h/w is shared, multiple flows will have an
>>impact on one another. That is why we have the different arbitration
>>mechanisms to enable one to control that impact.
>Please, can you explain it more clearly?
Most I/O devices are shared by multiple applications / kernel
subsystems. Hence, the device acts as a serialization point for what goes
on the wire / link. Sharing = resource contention and in order to add any
structure to that contention, a number of technologies provide arbitration
options. In the case of IB, the arbitration is confined to VL arbitration
where a given data flow is assigned to a VL and that VL is services at some
particular rate. A number of years ago I wrote up how one might also
provide QP arbitration (not part of the IBTA specifications) and I
understand some implementations have incorporated that or a variation of
the mechanisms into their products.
In addition to IB link contention, there is also PCI link / bus
contention. For PCIe, given most designs did not want to waste resources
on multiple VC, there really isn't any standard arbitration
mechanism. However, many devices, especially a device like a HCA or a
RNIC, already have the concept of separate resource domains, e.g. QP, and
they provide a mechanism to associate how the QP's DMA requests or
interrupts requests are scheduled to the PCIe link.
>> >> > must sends RNR periodically to A, right?. So does the pending message
>> >> > from A affects B's overall performance between B and C ?
>> >But RNR NAK is not for very long time.....possibly this performance
>> >hit you will not be able to observe even. The moment rnr_counter
>> >expires connection will be broken!
>>
>>Keep in mind the timeout can be infinite. RNR NAK are not expected to be
>>frequent so their performance impact was considered reasonable.
>Thanks I missed that.
It is a subtlety within the specification that is easy to miss.
Mike
More information about the general
mailing list