[ofa-general] Problems using OFED 1.4 on largesmp nodes

Ole Widar Saastad o.w.saastad at usit.uio.no
Tue Feb 3 01:44:02 PST 2009


I have problems using the OFED 1.4 software on the Sun x4600 nodes.
Need help to get this to work. We plan to run GPFS over IB on these
nodes in addition to MPI.

Sun 4600 nodes with 8 quad core cpus,
Quad-Core AMD Opteron(tm) Processor 8380

OS is Rocks release 4.
centos-release-4-4.2/x86_64/

Linux compute-0-0.local 2.6.9-67.0.15.ELlargesmp #1 SMP Thu May 8
11:03:57 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux


Needless to say our 300+ nodes (SUN x2200 with quad core) runs fine with
OFED 1.4 (and 1.3), they have the almost the same kernel : 
Linux compute-4-0.local 2.6.9-67.0.15.ELsmp #1 SMP Thu May 8 10:50:20
EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
Same except  ELsmp and not ELlargesmp.

More information:

dmesg prints out the following error message :

Losing some ticks... checking if CPU frequency changed.
modulecmd[17499]: segfault at 0000007fc0b01688 rip 000000000060aa38 rsp 0000007fbfffcfd8 error 6
mlx4_core: Mellanox ConnectX core driver v1.0 (April 4, 2008)
mlx4_core: Initializing 0000:02:00.0
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:02:00.0 to 64
mlx4_core 0000:02:00.0: Requested number of MACs is too much for port 1, reducing to 1.
MSI INIT SUCCESS
mlx4_core 0000:02:00.0: command 0x13 failed: fw status = 0x1
mlx4_core 0000:02:00.0: SW2HW_EQ failed (-5)
mlx4_core 0000:02:00.0: Failed to initialize event queue table, aborting.
mlx4_core: probe of 0000:02:00.0 failed with error -5

The following software is installed:

Select Option [1-5]:3
kernel-ib
libibverbs
libibverbs-devel
libibverbs-utils
libmthca
libmlx4
libcxgb3
libnes
libipathverbs
libibcommon
libibcommon-devel
libibumad
libibumad-devel
ofed-docs
ofed-scripts
ibvexdmtools
qlgc_vnic_daemon


Just to be sure the card is present :
lspci returns :
02:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)


-- 
Ole W. Saastad, dr. scient.
Scientific Computing Group, USIT, University of Oslo
http://hpc.uio.no




More information about the general mailing list