[ofa-general] Problems using OFED 1.4 on largesmp nodes
Tziporet Koren
tziporet at dev.mellanox.co.il
Tue Feb 3 06:14:00 PST 2009
I am looking here how to help you.
Can you specify which FW version are you using?
Also - please make sure you have the most updated BIOS for the AMD system
Tziporet
On Tue, Feb 3, 2009 at 11:44 AM, Ole Widar Saastad
<o.w.saastad at usit.uio.no>wrote:
>
> I have problems using the OFED 1.4 software on the Sun x4600 nodes.
> Need help to get this to work. We plan to run GPFS over IB on these
> nodes in addition to MPI.
>
> Sun 4600 nodes with 8 quad core cpus,
> Quad-Core AMD Opteron(tm) Processor 8380
>
> OS is Rocks release 4.
> centos-release-4-4.2/x86_64/
>
> Linux compute-0-0.local 2.6.9-67.0.15.ELlargesmp #1 SMP Thu May 8
> 11:03:57 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
>
>
> Needless to say our 300+ nodes (SUN x2200 with quad core) runs fine with
> OFED 1.4 (and 1.3), they have the almost the same kernel :
> Linux compute-4-0.local 2.6.9-67.0.15.ELsmp #1 SMP Thu May 8 10:50:20
> EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
> Same except ELsmp and not ELlargesmp.
>
> More information:
>
> dmesg prints out the following error message :
>
> Losing some ticks... checking if CPU frequency changed.
> modulecmd[17499]: segfault at 0000007fc0b01688 rip 000000000060aa38 rsp
> 0000007fbfffcfd8 error 6
> mlx4_core: Mellanox ConnectX core driver v1.0 (April 4, 2008)
> mlx4_core: Initializing 0000:02:00.0
> ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 19 (level, low) -> IRQ 193
> PCI: Setting latency timer of device 0000:02:00.0 to 64
> mlx4_core 0000:02:00.0: Requested number of MACs is too much for port 1,
> reducing to 1.
> MSI INIT SUCCESS
> mlx4_core 0000:02:00.0: command 0x13 failed: fw status = 0x1
> mlx4_core 0000:02:00.0: SW2HW_EQ failed (-5)
> mlx4_core 0000:02:00.0: Failed to initialize event queue table, aborting.
> mlx4_core: probe of 0000:02:00.0 failed with error -5
>
> The following software is installed:
>
> Select Option [1-5]:3
> kernel-ib
> libibverbs
> libibverbs-devel
> libibverbs-utils
> libmthca
> libmlx4
> libcxgb3
> libnes
> libipathverbs
> libibcommon
> libibcommon-devel
> libibumad
> libibumad-devel
> ofed-docs
> ofed-scripts
> ibvexdmtools
> qlgc_vnic_daemon
>
>
> Just to be sure the card is present :
> lspci returns :
> 02:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)
>
>
> --
> Ole W. Saastad, dr. scient.
> Scientific Computing Group, USIT, University of Oslo
> http://hpc.uio.no
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090203/ed56612b/attachment.html>
More information about the general
mailing list