[openib-general] Immediate data question

Tue Feb 20 21:21:42 PST 2007

On 2/15/07, Michael Krause <krause at cup.hp.com> wrote:
> At 09:37 PM 2/14/2007, Devesh Sharma wrote:
> >On 2/14/07, Michael Krause <krause at cup.hp.com> wrote:
> >>At 05:37 AM 2/13/2007, Devesh Sharma wrote:
> >> >On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
> >> >>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
> >> >> > > >
> >> >> > > >Not for the receiver, but the sender will be severely slowed down by
> >> >> > > >having to wait for the RNR timeouts.
> >> >> > >
> >> >> > > RNR = Receiver Not Ready so by definition, the data flow
> >> >> > > isn't going to
> >> >> > > progress until the receiver is ready to receive data.   If a
> >> >> > > receive QP
> >> >> > > enters RNR for a RC, then it is likely not progressing as
> >> >> > > desired.   RNR
> >> >> > > was initially put in place to enable a receiver to create
> >> >> > > back pressure to the sender without causing a fatal error
> >> >> > > condition.  It should rarely be entered and therefore should
> >> >> > > have negligible impact on overall performance however when a
> >> >> > > RNR occurs, no forward progress will occur so performance is
> >> >> > > essentially zero.
> >> >> >
> >> >> > Mike:
> >> >> >         I still do not quite understand this issue. I have two
> >> >> > situations that have RNR triggered.
> >> >> >
> >> >> > 1. process A and process B is connected with QP. A first post a send to
> >> >> > B, B does not post receive. Then A and B are doing a long time
> >> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
> >> >> > message. Finally B will post a receive. Does the first pending send
> >> in A
> >> >> > block all the later RDMA_WRITE ?
> >> >>According to IBTA spec HCA will process WR entries in strict order in
> >> >>which they are posted so the send will block all WR posted after this
> >> >>send, Until-unless HCA has multiple processing elements, I think even
> >> >>then processing order will be maintained by HCA
> >> >>  If not, since RNR is triggered
> >> >> > periodically till B post receive, does it affect the RDMA_WRITE
> >> >> > performance between A and B ?
> >> >> >
> >> >> > 2. extend above to three processes, A connect to B, B connect to C,
> >> so B
> >> >> > has two QPs, but one CQ.A posts a send to B, B does not post receive,
> >> >post ordering accross QP is not guaranteed hence presence of same CQ
> >> >or different CQ will not affect any thing.
> >> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
> >> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
> >I am sorry I have missed that in both cases same DMA channel is in use.
> >> >_may_ affect the performance, since load is on same HCA. In case of
> >> >Send/Recv again _may_ affect the performance, with the same reason.
> >>
> >>Seems orthogonal.  Any time h/w is shared, multiple flows will have an
> >>impact on one another.  That is why we have the different arbitration
> >>mechanisms to enable one to control that impact.
> >Please, can you explain it more clearly?
>
> Most I/O devices are shared by multiple applications / kernel
> subsystems.   Hence, the device acts as a serialization point for what goes
> on the wire / link.   Sharing = resource contention and in order to add any
> structure to that contention, a number of technologies provide arbitration
> options.   In the case of IB, the arbitration is confined to VL arbitration
> where a given data flow is assigned to a VL and that VL is services at some
> particular rate.   A number of years ago I wrote up how one might also
> provide QP arbitration (not part of the IBTA specifications) and I
> understand some implementations have incorporated that or a variation of
> the mechanisms into their products.
Thanks mike for a nice explanation. I am sorry for the late reply,
Now I got it, here Chang is trying to find out performance hit due to
RNR NAK, performance hit due to device sharing is any how going to be
there so "load on same HCA" is not the proper explanation.
Am I correct now?
>
> In addition to IB link contention, there is also PCI link / bus
> contention.   For PCIe, given most designs did not want to waste resources
> on multiple VC, there really isn't any standard arbitration
> mechanism.   However, many devices, especially a device like a HCA or a
> RNIC, already have the concept of separate resource domains, e.g. QP, and
> they provide a mechanism to associate how the QP's DMA requests or
> interrupts requests are scheduled to the PCIe link.
>
>
> >> >> > must sends RNR periodically to A, right?. So does the pending message
> >> >> > from A affects B's overall performance  between B and C ?
> >> >But RNR NAK is not for very long time.....possibly this performance
> >> >hit you will not be able to observe even. The moment rnr_counter
> >> >expires connection will be broken!
> >>
> >>Keep in mind the timeout can be infinite.  RNR NAK are not expected to be
> >>frequent so their performance impact was considered reasonable.
> >Thanks I missed that.
>
> It is a subtlety within the specification that is easy to miss.
>
> Mike
>
>
>