Yeah, did a reboot. I verified the modules weren't loaded
(lsmod), and then modprobed ib_mthca. The same errors that I was
seeing during startup were dropped to screen:<br>
<br>
<font size="1"><span style="font-family: courier new,monospace;">p5l1:~# lsmod</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">Module Size Used by</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">p5l1:~# modprobe ib_mthca</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599947.213712] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599947.213732] ib_mthca: Initializing Mellanox Technologies MT23108 InfiniHost (0001:c1:00.0)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488315] EEH: MMIO failure (2) on device: pci15b3,5a44 /pci@800000020000003/pci@2/pci@1/pci15b3,5a44@0</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488343] Call Trace:</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488351] [c00000000f02b050] [c00000000002fc80] .eeh_dn_check_failure+0x2bc/0x314 (unreliable)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488380] [c00000000f02b130] [c00000000002fdd4] .eeh_check_failure+0xfc/0x190</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488425] [c00000000f02b1c0] [d0000000005f37cc] .mthca_cmd_poll+0x120/0x258 [ib_mthca]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488469] [c00000000f02b290] [d0000000005f3cc8] .mthca_cmd_box+0x90/0xa8 [ib_mthca]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488516] [c00000000f02b330] [d0000000005f5444] .mthca_INIT_HCA+0x240/0x288 [ib_mthca]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488561] [c00000000f02b3e0] [d0000000005f2790] .mthca_init_one+0xd2c/0x180c [ib_mthca]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488600] [c00000000f02b870] [c0000000001d4a2c] .pci_device_probe+0xac/0xdc</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488622] [c00000000f02b900] [c000000000239ec0] .driver_probe_device+0x80/0x15c</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488647] [c00000000f02b990] [c00000000023a130] .__driver_attach+0xa8/0xc4</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488669] [c00000000f02ba20] [c0000000002390d4] .bus_for_each_dev+0x78/0xcc</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488699] [c00000000f02bad0] [c00000000023a174] .driver_attach+0x28/0x40</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488718] [c00000000f02bb50] [c000000000239848] .bus_add_driver+0xc8/0x1dc</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488751] [c00000000f02bc00] [c00000000023a7b0] .driver_register+0x44/0x5c</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488771] [c00000000f02bc90] [c0000000001d46e4] .pci_register_driver+0x84/0xd8</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488808] [c00000000f02bd10] [d000000000607594] .mthca_init+0x1c/0x48 [ib_mthca]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488857] [c00000000f02bd90] [c00000000006cc88] .sys_init_module+0x2f0/0x4cc</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488885] [c00000000f02be30] [c00000000000d300] syscall_exit+0x0/0x18</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488914] EEH: MMIO failure (2), notifiying device 0001:c1:00.0 Mellanox Technologies MT23108 InfiniHost</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.488986] ib_mthca 0001:c1:00.0: HCA FW version 3.2.0 is old (3.3.3 is current).</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.489002] ib_mthca 0001:c1:00.0: If you have problems, try updating your HCA FW.</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.490093] ib_mthca 0001:c1:00.0: SW2HW_MPT returned status 0x01</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.490107] ib_mthca 0001:c1:00.0: Failed to create driver PD, aborting.</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">[599948.492268] ib_mthca: probe of 0001:c1:00.0 failed with error -22</span></font><br>
<br>
This is on an OpenPower 720...<br>
<br>
Thaddeus<br>
<br>
<br>
<div><span class="gmail_quote">On 9/22/05, <b class="gmail_sendername">Pradeep Satyanarayana</b> <<a href="mailto:pradeep@us.ibm.com">pradeep@us.ibm.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<p>Adding ib_mthca to /etc/hotplug/blacklist worked for us (i.e. it is
the workaround we adopted). Just to double check, you did reboot after
adding to the blaclkist and then loaded ib_mthca after reboot -right?<br>
<br>
BTW, what kind of Power5 machine are you using?<span class="q"><br>
<br>
Pradeep<br>
<a href="mailto:pradeep@us.ibm.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">pradeep@us.ibm.com</a><br></span><span class="q">
<img src="cid:10__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="Inactive hide details for Thaddeus Ternes <tternes@gmail.com>" height="16" width="16">Thaddeus Ternes <<a href="mailto:tternes@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
tternes@gmail.com</a>><br>
<br>
<br>
</span><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody><tr valign="top"><td width="40%">
<ul>
<ul>
<ul>
<ul><span class="q"><b><font size="2">Thaddeus Ternes <<a href="mailto:tternes@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">tternes@gmail.com</a>></font></b><font size="2"> </font>
</span><p><font size="2">09/22/2005 01:42 PM</font>
<table border="1">
<tbody><tr valign="top"><td bgcolor="#ffffff" width="168"><div align="center"><font size="2">Please respond to<br>
Thaddeus Ternes</font></div></td></tr>
</tbody></table>
</p></ul>
</ul>
</ul>
</ul>
</td><td width="60%">
<span class="q"></span><span class="sg"></span><span class="q">
</span><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody><tr valign="top"><td valign="middle" width="1%"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="58"><br>
<div align="right"><font size="2">To</font></div></td><td width="100%"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="1"><br>
<font size="2">Roland Dreier <<a href="mailto:rolandd@cisco.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">rolandd@cisco.com</a>></font></td></tr>
<tr valign="top"><td valign="middle" width="1%"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="58"><br>
<div align="right"><font size="2">cc</font></div></td><td width="100%"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="1"><br>
<font size="2">Pradeep Satyanarayana/Beaverton/IBM@IBMUS, <a href="mailto:openib-general@openib.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">openib-general@openib.org</a></font></td></tr><tr valign="top">
<td valign="middle" width="1%"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="58"><br>
<div align="right"><font size="2">Subject</font></div></td><td width="100%"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="1"><br>
<font size="2">Re: [openib-general] EEH: MMIO Failure on Power5</font></td></tr>
</tbody></table>
<table border="0" cellpadding="0" cellspacing="0">
<tbody><tr valign="top"><td width="58"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="1"></td><td width="336"><img src="cid:30__=07BBFA17DFE10CA78f9e8a93df938@us.ibm.com" alt="" border="0" height="1" width="1">
</td></tr>
</tbody></table>
</td></tr>
</tbody></table></p><div><span class="e" id="q_1067f9ed1643ce5d_12">
<br>
<tt>Yeah, same result as before.<br>
<br>
On 9/22/05, Roland Dreier <<a href="mailto:rolandd@cisco.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">rolandd@cisco.com</a>> wrote:<br>
> Thaddeus> These are OpenPower 720 machines. I've been away from<br>
> Thaddeus> the office for a few days, so I'll do some more poking<br>
> Thaddeus> around to see if I can come up with anything else.<br>
> Thaddeus> Maybe I've missed something in the logs or dmesg...<br>
><br>
> Have you tried the workaround of adding 'ib_mthca' to /etc/hotplug/blacklist<br>
> and then loading the module after the system is fully booted?<br>
><br>
> - R.<br>
><br>
</tt><br>
</span></div><p></p><br clear="all"></blockquote></div><br>