[Users] Firmware upgrade appears to have broken driver

Orion Poplawski orion at cora.nwra.com
Mon Apr 29 14:35:02 PDT 2013


Unfortunately it seems I paid too much attention to this message:

ib_mthca 0000:01:00.0: HCA FW version 4.7.400 is old (4.8.200 is current).
ib_mthca 0000:01:00.0: If you have problems, try updating your HCA FW.

So I updated my MHGA28-1TC A1s firmware with:

mstflint -d 01:00.0 -i fw-25208-4_8_200-MHGA28-1TC_A1-A3.bin -nofs burn

Now the driver fails to load with:

ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
ib_mthca: Initializing 0000:01:00.0
   alloc irq_desc for 18 on node -1
   alloc kstat_irqs on node -1
ib_mthca 0000:01:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
ib_mthca 0000:01:00.0: setting latency timer to 64
ib_mthca 0000:01:00.0: SYS_EN command returned -22, aborting.
ib_mthca 0000:01:00.0: PCI INT A disabled
ib_mthca: probe of 0000:01:00.0 failed with error -22

The firmware appears to checkout fine:

# mstflint -d 01:00.0 v

Failsafe image:

Invariant       /0x00000028-0x0000095f (0x000938)/ (BOOT2) - OK

Primary   Image /0x00020000-0x00020107 (0x000108)/ (Pointer Sector)- OK
                 /0x00060028-0x000608af (0x000888)/ (BOOT2) - OK
                 /0x000608b0-0x0006504b (0x00479c)/ (BOOT2) - OK
                 /0x0006504c-0x00065f5f (0x000f14)/ (Configuration) - OK
                 /0x00065f60-0x00065f93 (0x000034)/ (GUID) - OK
                 /0x00065f94-0x0006eed7 (0x008f44)/ (DDR) - OK
                 /0x0006eed8-0x0007d7af (0x00e8d8)/ (DDR) - OK
                 /0x0007d7b0-0x0008104f (0x0038a0)/ (DDR) - OK
                 /0x00081050-0x00082b3f (0x001af0)/ (DDR) - OK
                 /0x00082b40-0x0009bc17 (0x0190d8)/ (DDR) - OK
                 /0x0009bc18-0x000b0387 (0x014770)/ (DDR) - OK
                 /0x000b0388-0x000b039b (0x000014)/ (Configuration) - OK
                 /0x000b039c-0x000b03df (0x000044)/ (Jump addresses) - OK
                 /0x000b03e0-0x000b05db (0x0001fc)/ (FW Configuration) - OK

Secondary Image /0x00040000-0x00040107 (0x000108)/ (Pointer Sector)- OK
                 /0x000c0028-0x000c08af (0x000888)/ (BOOT2) - OK
                 /0x000c08b0-0x000c504b (0x00479c)/ (BOOT2) - OK
                 /0x000c504c-0x000c5f5f (0x000f14)/ (Configuration) - OK
                 /0x000c5f60-0x000c5f93 (0x000034)/ (GUID) - OK
                 /0x000c5f94-0x000ceed7 (0x008f44)/ (DDR) - OK
                 /0x000ceed8-0x000dd7af (0x00e8d8)/ (DDR) - OK
                 /0x000dd7b0-0x000e104f (0x0038a0)/ (DDR) - OK
                 /0x000e1050-0x000e2b3f (0x001af0)/ (DDR) - OK
                 /0x000e2b40-0x000fbc17 (0x0190d8)/ (DDR) - OK
                 /0x000fbc18-0x00110387 (0x014770)/ (DDR) - OK
                 /0x00110388-0x0011039b (0x000014)/ (Configuration) - OK
                 /0x0011039c-0x001103df (0x000044)/ (Jump addresses) - OK
                 /0x001103e0-0x001105db (0x0001fc)/ (FW Configuration) - OK

FW image verification succeeded. Image is bootable.

Any ideas what might be causing this?

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                   http://www.nwra.com



More information about the Users mailing list