[ofa-general] post_recv question

Tom Tucker tom at opengridcomputing.com
Thu Feb 21 14:47:59 PST 2008


On Thu, 2008-02-21 at 12:22 -0800, Roland Dreier wrote:
> > OpenMPI can be configured to send credit updates over different QP. I'll
>  > try to stress it next week to see what happens.
> 
> It seems that it would be pretty hard to hit this race in practice.

> And I don't think mem-free Mellanox hardware has any race -- not
> positive about Tavor/non-mem-free Arbel.  (On IB you need to set RNR
> retries to 0 also for the missing receive to be detectable even if the
> race exists)

Well....consider the case of two adapters on two different pci busses.
One is busy one is not. Specifically, the post_recv QP is on an HCA on a
busy bus, the post_send (of the credit) is on a QP on an HCA on a
dedicated bus. 

I think we can assume that the ringing of the doorbell is synchronous,
i.e. when the processor completes it's write, the card knows there are
RQ WQE available in host memory, but whether or not and when the WQE is
fetched relative to the processor is asynchronous. The card will have to
get on the bus again and read host memory. Meanwhile the processor runs
off and posts a send on the other QP on a different HCA of the credit.
The peer responds, with a send to the "data qp". The receiving adapter
knows the WQE is there, but it may not have fetched it yet.

The crux of the question is whether or not the adapter MUST fetch the
WQE and place the packet, or can it simply drop it. If you say it MUST,
then you must have enough buffer to handle worst case delayed placement.
If the post guarantee is only within the same QP or affiliated QP (SRQ),
then all it must do is ensure that when processing a SQ request AND the
associated RQ (SRQ) is empty, that it must fetch outstanding, unread RQ
WQE prior to processing the SQ WQE. This allows for the post_recv
guarantees without the HCA buffering requirements.

I seem to recall that "the specs" say something about ordering and
synchronization between unaffiliated QP and/or between adapters, but the
specific reference long ago fell off my LRU list.

Tom

> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list