[openib-general] Immediate data question

Wed Feb 21 09:25:11 PST 2007

At 09:21 PM 2/20/2007, Devesh Sharma wrote:
>On 2/15/07, Michael Krause <krause at cup.hp.com> wrote:
>>At 09:37 PM 2/14/2007, Devesh Sharma wrote:
>> >On 2/14/07, Michael Krause <krause at cup.hp.com> wrote:
>> >>At 05:37 AM 2/13/2007, Devesh Sharma wrote:
>> >> >On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
>> >> >>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>> >> >> > > >
>> >> >> > > >Not for the receiver, but the sender will be severely slowed 
>> down by
>> >> >> > > >having to wait for the RNR timeouts.
>> >> >> > >
>> >> >> > > RNR = Receiver Not Ready so by definition, the data flow
>> >> >> > > isn't going to
>> >> >> > > progress until the receiver is ready to receive data.   If a
>> >> >> > > receive QP
>> >> >> > > enters RNR for a RC, then it is likely not progressing as
>> >> >> > > desired.   RNR
>> >> >> > > was initially put in place to enable a receiver to create
>> >> >> > > back pressure to the sender without causing a fatal error
>> >> >> > > condition.  It should rarely be entered and therefore should
>> >> >> > > have negligible impact on overall performance however when a
>> >> >> > > RNR occurs, no forward progress will occur so performance is
>> >> >> > > essentially zero.
>> >> >> >
>> >> >> > Mike:
>> >> >> >         I still do not quite understand this issue. I have two
>> >> >> > situations that have RNR triggered.
>> >> >> >
>> >> >> > 1. process A and process B is connected with QP. A first post a 
>> send to
>> >> >> > B, B does not post receive. Then A and B are doing a long time
>> >> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
>> >> >> > message. Finally B will post a receive. Does the first pending send
>> >> in A
>> >> >> > block all the later RDMA_WRITE ?
>> >> >>According to IBTA spec HCA will process WR entries in strict order in
>> >> >>which they are posted so the send will block all WR posted after this
>> >> >>send, Until-unless HCA has multiple processing elements, I think even
>> >> >>then processing order will be maintained by HCA
>> >> >>  If not, since RNR is triggered
>> >> >> > periodically till B post receive, does it affect the RDMA_WRITE
>> >> >> > performance between A and B ?
>> >> >> >
>> >> >> > 2. extend above to three processes, A connect to B, B connect to C,
>> >> so B
>> >> >> > has two QPs, but one CQ.A posts a send to B, B does not post 
>> receive,
>> >> >post ordering accross QP is not guaranteed hence presence of same CQ
>> >> >or different CQ will not affect any thing.
>> >> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
>> >> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
>> >I am sorry I have missed that in both cases same DMA channel is in use.
>> >> >_may_ affect the performance, since load is on same HCA. In case of
>> >> >Send/Recv again _may_ affect the performance, with the same reason.
>> >>
>> >>Seems orthogonal.  Any time h/w is shared, multiple flows will have an
>> >>impact on one another.  That is why we have the different arbitration
>> >>mechanisms to enable one to control that impact.
>> >Please, can you explain it more clearly?
>>
>>Most I/O devices are shared by multiple applications / kernel
>>subsystems.   Hence, the device acts as a serialization point for what goes
>>on the wire / link.   Sharing = resource contention and in order to add any
>>structure to that contention, a number of technologies provide arbitration
>>options.   In the case of IB, the arbitration is confined to VL arbitration
>>where a given data flow is assigned to a VL and that VL is services at some
>>particular rate.   A number of years ago I wrote up how one might also
>>provide QP arbitration (not part of the IBTA specifications) and I
>>understand some implementations have incorporated that or a variation of
>>the mechanisms into their products.
>Thanks mike for a nice explanation. I am sorry for the late reply,
>Now I got it, here Chang is trying to find out performance hit due to
>RNR NAK, performance hit due to device sharing is any how going to be
>there so "load on same HCA" is not the proper explanation.
>Am I correct now?

Yes.   You need to separate RNR NAK performance impacts as distinct from 
the multiple application sharing impacts.

Mike

>>In addition to IB link contention, there is also PCI link / bus
>>contention.   For PCIe, given most designs did not want to waste resources
>>on multiple VC, there really isn't any standard arbitration
>>mechanism.   However, many devices, especially a device like a HCA or a
>>RNIC, already have the concept of separate resource domains, e.g. QP, and
>>they provide a mechanism to associate how the QP's DMA requests or
>>interrupts requests are scheduled to the PCIe link.
>>
>>
>> >> >> > must sends RNR periodically to A, right?. So does the pending 
>> message
>> >> >> > from A affects B's overall performance  between B and C ?
>> >> >But RNR NAK is not for very long time.....possibly this performance
>> >> >hit you will not be able to observe even. The moment rnr_counter
>> >> >expires connection will be broken!
>> >>
>> >>Keep in mind the timeout can be infinite.  RNR NAK are not expected to be
>> >>frequent so their performance impact was considered reasonable.
>> >Thanks I missed that.
>>
>>It is a subtlety within the specification that is easy to miss.
>>
>>Mike
>>
>>