[openib-general] DMA mapping abuses in MAD layer

Tue Oct 11 20:39:26 PDT 2005

I recently got a chance to play with an eval board for the PowerPC
440SPe -- an embedded system with PCI Express support where the PCI
bus is not cache coherent with the CPU.  Of course I plugged an HCA in
and tried out our current drivers.

It turns out that everything works pretty well, except the HCA's ports
never make it past INIT.  I did some debugging, and the reason for
this is that the MAD layer doesn't quite use the DMA mapping API
properly.  Once we call dma_map_single() on a buffer, the CPU may not
touch that buffer until after the corresponding dma_unmap_single().

On mainstream architectures, it turns out that we can get away with
violating this rule.  However, on non-cache-coherent architectures
like PowerPC 4xx, dma_map_single(..., DMA_TO_DEVICE) does a cache
flush, which makes sure that the contents of the CPU's cache are
really written to memory.  If a driver then changes the contents of
the buffer after the call to dma_map_single(), then it's quite likely
that the change will be made only in the CPU's cache and the device
will end up DMA-ing the old data.

The problem I hit is in ib_post_send_mad(), specifically:

		smp = (struct ib_smp *)send_wr->wr.ud.mad_hdr;
		if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
			ret = handle_outgoing_dr_smp(mad_agent_priv, smp,
						     send_wr);

basically, when the MAD layer goes to send a directed route reply, it
changes the MAD buffer after the DMA mapping is done.  The HCA
doesn't see the change, the wrong packet gets sent and the SM never
sees replies to its queries.

Adding a PPC-specific cache flush call after the call to
handle_outgoing_dr_smp() fixes things to the point that the port can
be brought to ACTIVE, and in fact IPoIB works as well.  However, this
is just a cludge -- the real fix will need to be more invasive.  It
seems that the whole interface to the MAD layer may need to be
reorganized to avoid doing this.

It looks like there is a similar problem with ib_create_send_mad(): it
does DMA mapping on a buffer that is then returned for the caller to modify.

Finally, some of the MAD structures like struct ib_mad_private look
risky to me, since kernel data might potentially share a cache line
with DMA buffers.  See <http://lwn.net/Articles/2265/> for a nice
writeup of the class of bug that might be lurking.

Sorry for missing all of this when the MAD layer was first being
developed and reviewed.

 - R.