[openib-general] Re: Failure in reset HCA withbackport-svn4507-to-2.6.9

Grant Grundler iod00d at hp.com
Fri Jan 6 15:16:05 PST 2006


On Fri, Jan 06, 2006 at 12:26:07PM -0800, Ranjit Pandit wrote:
> I'm running with 3.3.3 which is pretty recent.

yeah - I've not had/exposed any problems with 3.3.3 either.

> It looks like the problem is system dependent and reproducible on Dell 2650's.
> Has anybody lately tested on a Dell 2650?

Dell 2650 is advertised to have Serverworks "Grand Champion LE" (GC-LE) chipset.
Maybe this quirk is relevant?
(I doubt it but it's possible.)

See drivers/pci/quirks.c:
...
static void __init quirk_svw_msi(struct pci_dev *dev)
{
	pci_msi_quirk = 1;
	printk(KERN_WARNING "PCI: MSI quirk detected. pci_msi_quirk set.\n");
}
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_GCNB_LE, quirk_svw_msi );
#endif /* CONFIG_X86_IO_APIC */
...

> Btw, if I comment out  mthca_reset() in mthca_main.c, then the drivers
> load and ports go active on the 2650.
> 
> I suggest somebody review the reset path in mthca... In the past we
> have had problems reseting Tavor on some platforms and chose not to
> reset at driver load time.

I'm all for reseting the card and re-initializing as long as it doesn't
perturb the rest of the cluster.
I've had initialization problems with tg3 driver in the past.
They turned out to be bugs in the driver init path making assumptions
about the state of the card as handed off by firmware. Rolling BIOS
driver (EFI in this case) was risky.

If it's really chipset related, it's likely a timing/ordering problem
where mthca isn't enforcing ordering when it should or not waiting
long enough for the card to recover. Adding printk's is usually
sufficient to figure out if it's a timing problem.

grant



More information about the general mailing list