[ofa-general] post_recv question

Thu Feb 21 11:32:46 PST 2008

Ralph Campbell wrote:
> On Thu, 2008-02-21 at 10:09 -0800, Shirley Ma wrote:
>> Hello Steve,
>>
>>> Here is the timing sequence:
>>>
>>> t0: app calls post_recv
>>> t1: post_recv code builds a hw-specific WR in the hw work queue
>>> t2: post_recv code rings a doorbell (write to adapter mem or
>> register)
>>> t3: post_recv returns
>>> t4: <app assumes the buffer is ready>
> 
> This is wrong. The HCA has control of the receive buffer
> until poll_cq() returns a CQE saying the posted buffer
> is completed (either OK or error).
> Think about it. The application can do a post_recv() and
> it could be days or nanoseconds before a packet is sent to
> that buffer. The application can't assume anything about
> the contents until the HCA says something is there.
> 
> Oh, I see. You are saying the application thinks the buffer
> is available for the HCA to use.
> 
>>> t5: device HW dma engine moves the WR to adapter memory
>>> t6: device FW prepares the HW RQ entry making the buffer available.
>>>
>>> Note at time t4, the application thinks its ready, but its really
>> not 
>>> ready until t6.
>>> This clearly is a implementation-specific issue.  But I was under
>> the 
>>> assumption that all the RDMA HW behaves this way.  Maybe not?
> 
> Not all hardware works the same.  You can't make assumptions
> beyond what the library API guarantees without building
> hardware specific dependencies into your program.
>

I'm asking this from a device driver developer's perspective.  I'm not 
writing an application.  I'm trying to understand and define exactly 
what must be guaranteed by the device/driver up returning from 
post_recv().

> It can even change between different versions of microcode or
> kernel software for the same HCA.
> 
>>> To further complicate things, this race condition is never seen _if_
>> the 
>>> application uses the same QP to advertise (send a credit allowing
>> the 
>>> peer to SEND) the RECV buffer availability.  So if the app posts a
>> SEND 
>>> after the RECV is posted and that SEND allows the peer access to
>> the 
>>> RECV buffer, then everything is ok.  This is due to the fact that
>> the 
>>> FW/HW will process the SEND only after processing the RECV.  If the
>> app 
>>> uses a different QP to post the SEND advertising the RECV, then the
>> race 
>>> condition exists allowing the peer to SEND into that RECV buffer
>> before 
>>> the HW makes it ready.
> 
> Well, there is no guarantee that the HCA processes the post_recv()
> before the post_send() even on the same QP. Send and receive are
> unordered with respect to each other. The fact that it works is
> an HCA specific implementation artifact.
> 
>>> This all assumes a specific design of rdma hw.  Maybe nobody else
>> has 
>>> this issue?
>>>
>>> Maybe I'm not making sense. :)
>> I think your descriptions here match what Ralph found RNR in IPoIB-CM.
>>
>> Ralph,
>>
>> Does this make sense?
>>
>> Thanks
>> Shirley
> 
> I think you are making sense.  There is an indeterminate race
> between post_recv() returning to the application and when
> a packet being received by the HCA might be able to use
> that buffer. There are no ordering guarantees
> between messages sent on one QP and another so the application
> can't easily use a different QP to advertise posted buffers (credits).
> That is why the IB RC protocol does this for you in band if the RC QP
> is using a dedicated receive queue but not a shared receive queue.
> 

Do you mean the IB RC protocol advertises credits as part of the 
transport protocol?

> The problem with shared receive queues is that the application
> would have to pick an endpoint and tell it there is a buffer
> available for the endpoint to send to. Obviously, if you have
> two endpoints, they both can't send to the same receive buffer.
> 
> ib_ipoib uses shared receive queues and doesn't try to manage
> posted buffer credits so the RNR NAK issue isn't the same
> as what Steve is trying to do.
>