[ofa-general] Problems with mlx4

Andrey Slepuhin andrey.slepuhin at t-platforms.ru
Wed Jun 13 07:56:57 PDT 2007


Dear folks,

I just setup a test cluster using ConnectX cards, but I can not get link 
up. I downloaded the kernel from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git

After inserting the modules I see that the card was initialized:

Jun 13 22:17:23 testnode1 kernel: mlx4_core: Mellanox ConnectX core 
driver v0.01 (May 1, 2007)
Jun 13 22:17:23 testnode1 kernel: mlx4_core: Initializing 0000:07:00.0
Jun 13 22:17:23 testnode1 kernel: ACPI: PCI Interrupt 0000:07:00.0[A] -> 
GSI 16 (level, low) -> IRQ 16
Jun 13 22:17:23 testnode1 kernel: PCI: Setting latency timer of device 
0000:07:00.0 to 64

But the link remains in "DOWN" state:

testnode1:~ # /opt/ofed/bin/ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0000:07a1
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            20 Gb/sec (4X DDR)

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0000:07a2
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            20 Gb/sec (4X DDR)

I tried different ports and cables but without success. Do you have any 
idea what's going wrong?
The nodes configuration is:
Intel S5000PSL motherboard, 2xXeon 5345, 8GB RAM
All the nodes are connected to Flextronics (Mellanox) 24-port DDR switch.
I'm running SLES10 with the kernel from Roland's tree:
testnode1:~ # uname -a
Linux testnode1 2.6.22-rc3 #1 SMP Wed Jun 6 23:56:36 MSD 2007 x86_64 
x86_64 x86_64 GNU/Linux

Any help will be much appreciated.

Thanks in advance,
Andrey



More information about the general mailing list