[openib-general] Causes of interrupt problems?

Roland Dreier roland at topspin.com
Fri Mar 18 20:23:12 PST 2005


 > What would cause the following?

 > ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004)
 > ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:04:00.0)
 > ib_mthca 0000:04:00.0: NOP command failed to generate interrupt, aborting.
 > ib_mthca 0000:04:00.0: BIOS or ACPI interrupt routing problem?

 > I've seen this on two Opteron systems, one Tyan board, one Rioworks
 > HDAMA. Is there some bios setting I should look for? Things are working
 > fine on another Rioworks HDAMA board.

It seems that the fact that the HCA appears as a PCI device with a
huge BAR behind a PCI bridge confuses some BIOS/ACPI implementations.

Looking at that error message I realize it might be nice to be able to
see what IRQ the driver is trying.  If you change the line in
mthca_main.c that prints the error to something like

		mthca_err(dev, "NOP command failed to generate interrupt (IRQ %d), aborting.\n",
			  dev->mthca_flags & MTHCA_FLAG_MSI_X ?
			  dev->eq_table.eq[MTHCA_EQ_CMD].msi_x_vector :
			  dev->pdev->irq);

then you can see what IRQ the HCA driver is trying.  Then you can put
another device like an ethernet in the same PCI slot and (assuming
that the device works) compare the IRQ it is using with the one that
mthca saw.  If they're different then most likely you have a BIOS/ACPI
problem.  Unfortunately I'm not much good at fixing that sort of
thing.  The only thing I know to try is looking for a newer BIOS version.

Other things to check: do the two HDAMA boards have the same BIOS
revision?  Is the HCA in the same slot in both boards?

 - R.



More information about the general mailing list