[ofa-general] Re: InfiniBand card (mthca) in Linux

Lukas Hejtmanek xhejtman at ics.muni.cz
Thu Jul 5 12:31:36 PDT 2007


Hello,

On Thu, Jul 05, 2007 at 09:10:11AM -0700, Roland Dreier wrote:
> I don't personally have much use for it, but of course I would be
> happy to merge changes that make this work better.
> 
> However, I would really prefer if we could have this discussion on
> general at lists.openfabrics.org instead of in private email; it's better
> for you too, because if I am too busy to answer then you may get an
> answer from someone else.  Anyway...

OK, I appended the address to the Cc.

> Are you getting these freezes when using Xen domU, or do you also see
> them with a normal kernel?  You said the card works "mostly OK" with
> dom0 -- what is not OK?

Well, in Dom0 the action:
modprobe ib_mthca
rmmod ib_mthca
modprobe ib_mthca

kills the machine. However, it is quite strange because it produces oops in
XFS (file system), for me, it looks like it does some memory corruption in the
kernel and basically I have the same problem in DomU where the same error is
induced by the first modprobe ib_mthca.

> How did you fix the device reset problem?

Xen in DomU does not let the device to modify address bars so after the device
reset the address bars are not restored thus I've modified Xen PCI backend to
allow direct modification of the bars if the device operates in the permissive 
mode.

Anyway, direct access to the PCI config space did not solve all the problems. 
Modprobe ib_mthca does init_one up to (and including) init_hca. In the
setup_hca it kills at least DomU and very often even Dom0 and even sometimes
it kills the whole machine so that physical power cycle is needed.
When it peforms setup_hca, I can always see an oops in XFS in DomU.

Dmesg says that the driver could not write MTT.

Any thoughts?

-- 
Lukáš Hejtmánek



More information about the general mailing list