[openib-general] EEH: MMIO Failure on Power5

Thaddeus Ternes tternes at gmail.com
Thu Sep 22 14:00:48 PDT 2005


Yeah, did a reboot. I verified the modules weren't loaded (lsmod), and then
modprobed ib_mthca. The same errors that I was seeing during startup were
dropped to screen:

p5l1:~# lsmod
Module Size Used by
p5l1:~# modprobe ib_mthca
[599947.213712] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23,
2005)
[599947.213732] ib_mthca: Initializing Mellanox Technologies MT23108
InfiniHost (0001:c1:00.0)
[599948.488315] EEH: MMIO failure (2) on device: pci15b3,5a44 /pci@
800000020000003/pci at 2/pci at 1/pci15b3,5a44 at 0
[599948.488343] Call Trace:
[599948.488351] [c00000000f02b050] [c00000000002fc80]
.eeh_dn_check_failure+0x2bc/0x314 (unreliable)
[599948.488380] [c00000000f02b130] [c00000000002fdd4]
.eeh_check_failure+0xfc/0x190
[599948.488425] [c00000000f02b1c0] [d0000000005f37cc]
.mthca_cmd_poll+0x120/0x258 [ib_mthca]
[599948.488469] [c00000000f02b290] [d0000000005f3cc8]
.mthca_cmd_box+0x90/0xa8 [ib_mthca]
[599948.488516] [c00000000f02b330] [d0000000005f5444]
.mthca_INIT_HCA+0x240/0x288 [ib_mthca]
[599948.488561] [c00000000f02b3e0] [d0000000005f2790]
.mthca_init_one+0xd2c/0x180c [ib_mthca]
[599948.488600] [c00000000f02b870] [c0000000001d4a2c]
.pci_device_probe+0xac/0xdc
[599948.488622] [c00000000f02b900] [c000000000239ec0]
.driver_probe_device+0x80/0x15c
[599948.488647] [c00000000f02b990] [c00000000023a130]
.__driver_attach+0xa8/0xc4
[599948.488669] [c00000000f02ba20] [c0000000002390d4]
.bus_for_each_dev+0x78/0xcc
[599948.488699] [c00000000f02bad0] [c00000000023a174]
.driver_attach+0x28/0x40
[599948.488718] [c00000000f02bb50] [c000000000239848]
.bus_add_driver+0xc8/0x1dc
[599948.488751] [c00000000f02bc00] [c00000000023a7b0]
.driver_register+0x44/0x5c
[599948.488771] [c00000000f02bc90] [c0000000001d46e4]
.pci_register_driver+0x84/0xd8
[599948.488808] [c00000000f02bd10] [d000000000607594] .mthca_init+0x1c/0x48
[ib_mthca]
[599948.488857] [c00000000f02bd90] [c00000000006cc88]
.sys_init_module+0x2f0/0x4cc
[599948.488885] [c00000000f02be30] [c00000000000d300] syscall_exit+0x0/0x18
[599948.488914] EEH: MMIO failure (2), notifiying device
0001:c1:00.0Mellanox Technologies MT23108 InfiniHost
[599948.488986] ib_mthca 0001:c1:00.0: HCA FW version 3.2.0 is old (3.3.3 is
current).
[599948.489002] ib_mthca 0001:c1:00.0: If you have problems, try updating
your HCA FW.
[599948.490093] ib_mthca 0001:c1:00.0: SW2HW_MPT returned status 0x01
[599948.490107] ib_mthca 0001:c1:00.0: Failed to create driver PD, aborting.
[599948.492268] ib_mthca: probe of 0001:c1:00.0 failed with error -22

This is on an OpenPower 720...

Thaddeus


On 9/22/05, Pradeep Satyanarayana <pradeep at us.ibm.com> wrote:
>
> Adding ib_mthca to /etc/hotplug/blacklist worked for us (i.e. it is the
> workaround we adopted). Just to double check, you did reboot after adding to
> the blaclkist and then loaded ib_mthca after reboot -right?
>
> BTW, what kind of Power5 machine are you using?
>
> Pradeep
> pradeep at us.ibm.com
> [image: Inactive hide details for Thaddeus Ternes <tternes at gmail.com>]Thaddeus
> Ternes <tternes at gmail.com>
>
>
>
>     *Thaddeus Ternes <tternes at gmail.com>*
>
>             09/22/2005 01:42 PM Please respond to
>             Thaddeus Ternes
>
>
> To
>
> Roland Dreier <rolandd at cisco.com>
> cc
>
> Pradeep Satyanarayana/Beaverton/IBM at IBMUS, openib-general at openib.org
> Subject
>
> Re: [openib-general] EEH: MMIO Failure on Power5
>
> Yeah, same result as before.
>
> On 9/22/05, Roland Dreier <rolandd at cisco.com> wrote:
> > Thaddeus> These are OpenPower 720 machines. I've been away from
> > Thaddeus> the office for a few days, so I'll do some more poking
> > Thaddeus> around to see if I can come up with anything else.
> > Thaddeus> Maybe I've missed something in the logs or dmesg...
> >
> > Have you tried the workaround of adding 'ib_mthca' to
> /etc/hotplug/blacklist
> > and then loading the module after the system is fully booted?
> >
> > - R.
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050922/40b10008/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050922/40b10008/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050922/40b10008/attachment-0001.gif>


More information about the general mailing list