<div dir="ltr">I am looking here how to help you.<br>Can you specify which FW version are you using?<br>Also - please make sure you have the most updated BIOS for the AMD system<br><br>Tziporet<br><br><br><div class="gmail_quote">
On Tue, Feb 3, 2009 at 11:44 AM, Ole Widar Saastad <span dir="ltr"><<a href="mailto:o.w.saastad@usit.uio.no">o.w.saastad@usit.uio.no</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
I have problems using the OFED 1.4 software on the Sun x4600 nodes.<br>
Need help to get this to work. We plan to run GPFS over IB on these<br>
nodes in addition to MPI.<br>
<br>
Sun 4600 nodes with 8 quad core cpus,<br>
Quad-Core AMD Opteron(tm) Processor 8380<br>
<br>
OS is Rocks release 4.<br>
centos-release-4-4.2/x86_64/<br>
<br>
Linux compute-0-0.local 2.6.9-67.0.15.ELlargesmp #1 SMP Thu May 8<br>
11:03:57 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux<br>
<br>
<br>
Needless to say our 300+ nodes (SUN x2200 with quad core) runs fine with<br>
OFED 1.4 (and 1.3), they have the almost the same kernel :<br>
Linux compute-4-0.local 2.6.9-67.0.15.ELsmp #1 SMP Thu May 8 10:50:20<br>
EDT 2008 x86_64 x86_64 x86_64 GNU/Linux<br>
Same except ELsmp and not ELlargesmp.<br>
<br>
More information:<br>
<br>
dmesg prints out the following error message :<br>
<br>
Losing some ticks... checking if CPU frequency changed.<br>
modulecmd[17499]: segfault at 0000007fc0b01688 rip 000000000060aa38 rsp 0000007fbfffcfd8 error 6<br>
mlx4_core: Mellanox ConnectX core driver v1.0 (April 4, 2008)<br>
mlx4_core: Initializing 0000:02:00.0<br>
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 19 (level, low) -> IRQ 193<br>
PCI: Setting latency timer of device 0000:02:00.0 to 64<br>
mlx4_core 0000:02:00.0: Requested number of MACs is too much for port 1, reducing to 1.<br>
MSI INIT SUCCESS<br>
mlx4_core 0000:02:00.0: command 0x13 failed: fw status = 0x1<br>
mlx4_core 0000:02:00.0: SW2HW_EQ failed (-5)<br>
mlx4_core 0000:02:00.0: Failed to initialize event queue table, aborting.<br>
mlx4_core: probe of 0000:02:00.0 failed with error -5<br>
<br>
The following software is installed:<br>
<br>
Select Option [1-5]:3<br>
kernel-ib<br>
libibverbs<br>
libibverbs-devel<br>
libibverbs-utils<br>
libmthca<br>
libmlx4<br>
libcxgb3<br>
libnes<br>
libipathverbs<br>
libibcommon<br>
libibcommon-devel<br>
libibumad<br>
libibumad-devel<br>
ofed-docs<br>
ofed-scripts<br>
ibvexdmtools<br>
qlgc_vnic_daemon<br>
<br>
<br>
Just to be sure the card is present :<br>
lspci returns :<br>
02:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)<br>
<br>
<br>
--<br>
Ole W. Saastad, dr. scient.<br>
Scientific Computing Group, USIT, University of Oslo<br>
<a href="http://hpc.uio.no" target="_blank">http://hpc.uio.no</a><br>
<br>
_______________________________________________<br>
general mailing list<br>
<a href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a><br>
<a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general" target="_blank">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</a><br>
<br>
To unsubscribe, please visit <a href="http://openib.org/mailman/listinfo/openib-general" target="_blank">http://openib.org/mailman/listinfo/openib-general</a><br>
</blockquote></div><br></div>