[openib-general] How to support IOMMUs for ipath driver

Ralph Campbell ralphc at pathscale.com
Wed Sep 13 11:30:57 PDT 2006


On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote:
> Ralph Campbell wrote:
> > Problem:
> > 
> > The IB kernel to IB device driver interface uses dma_map_single()
> > and dma_map_sg() to allocate device bus addresses for HW DMA.
> > These bus addresses are passed to the IB device driver via ib_post_send()
> > and ib_post_recv().
> > 
> > The ib_ipath driver needs kernel virtual addresses in order to be able
> > to copy data to/from the posted work requests since it does not
> > use HW DMA. It currently relies on the mapping being one-to-one
> > and cannot reasonably reverse the mapping when an IOMMU is present.
> 
> Oops, please note that one can get through the DMA api a DMA address for 
> a page which is currently **not** mapped into the kernel virtual address 
> space (that is page_address(p) is NULL), so you must add kmap and kunmap 
> into your fast RX/TX code path.

Yes, these are called "high pages".

> Examples for scenarios when this happen i can think of are Direct I/O 
> and some sort of pre-fetching done by File-System. Some pages present in 
> a kernel SG which needs to be sent/received/RDMA-ed over IB need not be 
> mapped into the kernel virtual address space.

Well, the other parts of the kernel might not need a kernel virtual
address but the ib_ipath driver still does.

> As for RDMA, please note that the problem has two faces, the remote 
> device which does the RDMA or the local device does RDMA from/to and 
> second, the local device.
> 
> Since you need to be able interop between devices that support DMA 
> mappings to ones which do not, how do you suggest to manage the 
> addresses for the following schemes (1 stands for device supporting DMA 
> addresses and 0 for device which does not)
> 
> <1,1>
> <1,0>
> <0,1>
> <0,0>
> 
> Please assume for the purpose of discussion that each side knows the 
> polarity of the remote side?
> 
> After writing the section on RDMA i think i might went to the wrong 
> direction since ipath emulates RDMA in SW, can you shed some light on this?

I don't understand what you are talking about. There is an IB
wire protocol for RDMA, SEND, etc. That doesn't change depending
on the HCA.
The InfiniPath HCA has a ring buffer of receive buffers and all
incoming IB packets are DMA'ed into one of these buffers.
The ib_ipath software driver examines the packet and
copies it to the appropriate address. For a packet received with
a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert
that into a kernel virtual address and the data is copied.
The same happens for RC_SEND_FIRST but the KV address comes from
the LKEY and address in the work request posted by ib_post_recv().

Sending data is similar, the driver constructs a packet with the
appropriate opcode and writes it to the chip which puts it on
the wire.

> > I also tried proposing adding a flag to the ib_device structure
> > and modifying the kernel IB code to check the flag and pass
> > either the dma_*() mapped address or a kernel virtual address.
> > This works OK for kmalloc() buffers where dma_map_single() is
> > being called but doesn't work well for SRP which has lists
> > of physical pages and calls dma_map_sg().
> > It also means that the kernel IB layer needs to explicitly handle
> > two different kinds of addresses.
> 
> Just a note, its not just SRP there... its any ulp which needs to move 
> over IB data present bunch of pages (eg packed in a kernel SG list), 
> namely iSER, NFSoRDMA, Lustre, IB native imp of send_page(), etc.

Sure. In each such case, the code would need to be modified to
use the ib_dma_*() routines instead of dma_*() for addresses used
with the LKEY/RKEY returned from ibv_get_dma_mr().





More information about the general mailing list