[ofa-general] Re: [PATCH] ib_core: Use weak ordering for data registered memory

Arnd Bergmann arnd at arndb.de
Mon Oct 20 14:41:06 PDT 2008


On Monday 20 October 2008, Roland Dreier wrote:
>  > Some architectures support weak ordering in which case better
>  > performance is possible. IB registered memory used for data can be
>  > weakly ordered becuase the the completion queues' buffers are
>  > registered as strongly ordered. This will result in flushing all data
>  > related outstanding DMA requests by the HCA when a completion is DMAed
>  > to a completion queue buffer.
> 
> This would break the Mellanox HW's guarantee of writing the last byte of
> an RDMA last, right?  So on platforms where this has an effect (only
> Cell at the moment) some applications could be subtly broken?

Yes, that is true. In our testing with openmpi, we had to disable eager RDMA.
However, without this patch RDMA infiniband performance on some of our machines
sucks so bad that we would not want to advertise support for it and I
would really love to see this patch make it into OFED-1.4 and Linux-2.6.28.

We (IBM and Mellanox) have discussed adding a module parameter for
whether or not this should be enabled at runtime, and possibly extending
the ibverbs interface to allow the application to choose. The question
there remains what the default should be. AFAIU, the IB specification does
not give any such guarantees about the ordering within RDMA transfers,
right? If that is so, applications relying on the ordering would be
broken to start with.

Also, the existing code evidently does not use DMA_ATTR_WRITE_BARRIER
for the allocation where we would use DMA_ATTR_WEAK_ORDERING. Because
of what I can only explain as lack of coordination, DMA_ATTR_WRITE_BARRIER
seems to be an almost exact opposite of DMA_ATTR_WEAK_ORDERING, which 
means that on platforms that use the former (SGI Altix so far), not passing
any DMA attribute would break these applications in exactly the same way
that weak ordering on cell is breaking them.

	Arnd <><



More information about the general mailing list