[ofa-general] Re: InfiniBand card (mthca) in Linux

Roland Dreier rdreier at cisco.com
Sun Jul 8 08:54:53 PDT 2007


 > 000: 17 00 00 00 17 00 00 00 18 00 00 00 18 00 00 00
 > 010: 19 00 00 00 19 00 00 00 1a 00 00 00 1a 00 00 00
 > 020: 1b 00 00 00 1b 00 00 00 1c 00 00 00 1c 00 00 00
 > 030: 1d 00 00 00 1d 00 00 00 1e 00 00 00 1e 00 00 00
 > 040: 1f 00 00 00 1f 00 00 00 00 00 00 00 00 00 00 00
 > 050: 01 00 00 00 01 00 00 00 02 00 00 00 02 00 00 00

OK, my guess right now would be that when the driver is trying to give
memory to the HCA to use for its internal hardware data structures,
the bus addresses given to the HCA end up being wrong for some reason.
There could be a bug in mthca, but since this code is working fine on
lots of non-Xen systems (and not just i386/x86-64 but also ppc and
ia64 at least) right now I would be more suspicious of a bug in the
Xen domU's pci_map_sg() or something like that.

You can look in mthca_memfree.c, specifically mthca_alloc_icm() to see
how the memory to give to the HCA is allocated and mapped.  I gave it
a quick look over and the way the DMA mapping API is used looks OK to
me, but perhaps there is a subtle problem that is exposed by Xen.
Although as I said before, right now I think it's more likely that we
are hitting a bug in the Xen domU implementation of DMA mapping.


Michael, does my guess about the source of corruption make sense?  Is
that pattern of every fourth byte counting up 00 ... 1f something the
the HCA would write during initialization of ICM?

 - R.



More information about the general mailing list