[openib-general] How to support IOMMUs for ipath driver

Or Gerlitz ogerlitz at voltaire.com
Wed Sep 13 02:00:50 PDT 2006


Ralph Campbell wrote:
> Problem:
> 
> The IB kernel to IB device driver interface uses dma_map_single()
> and dma_map_sg() to allocate device bus addresses for HW DMA.
> These bus addresses are passed to the IB device driver via ib_post_send()
> and ib_post_recv().
> 
> The ib_ipath driver needs kernel virtual addresses in order to be able
> to copy data to/from the posted work requests since it does not
> use HW DMA. It currently relies on the mapping being one-to-one
> and cannot reasonably reverse the mapping when an IOMMU is present.

Oops, please note that one can get through the DMA api a DMA address for 
a page which is currently **not** mapped into the kernel virtual address 
space (that is page_address(p) is NULL), so you must add kmap and kunmap 
into your fast RX/TX code path.

Examples for scenarios when this happen i can think of are Direct I/O 
and some sort of pre-fetching done by File-System. Some pages present in 
a kernel SG which needs to be sent/received/RDMA-ed over IB need not be 
mapped into the kernel virtual address space.

As for RDMA, please note that the problem has two faces, the remote 
device which does the RDMA or the local device does RDMA from/to and 
second, the local device.

Since you need to be able interop between devices that support DMA 
mappings to ones which do not, how do you suggest to manage the 
addresses for the following schemes (1 stands for device supporting DMA 
addresses and 0 for device which does not)

<1,1>
<1,0>
<0,1>
<0,0>

Please assume for the purpose of discussion that each side knows the 
polarity of the remote side?

After writing the section on RDMA i think i might went to the wrong 
direction since ipath emulates RDMA in SW, can you shed some light on this?

> I also tried proposing adding a flag to the ib_device structure
> and modifying the kernel IB code to check the flag and pass
> either the dma_*() mapped address or a kernel virtual address.
> This works OK for kmalloc() buffers where dma_map_single() is
> being called but doesn't work well for SRP which has lists
> of physical pages and calls dma_map_sg().
> It also means that the kernel IB layer needs to explicitly handle
> two different kinds of addresses.

Just a note, its not just SRP there... its any ulp which needs to move 
over IB data present bunch of pages (eg packed in a kernel SG list), 
namely iSER, NFSoRDMA, Lustre, IB native imp of send_page(), etc.

Or.





More information about the general mailing list