[openib-general] ipoib/mthca broken on ia64?

Grant Grundler iod00d at hp.com
Wed Mar 23 15:46:23 PST 2005


On Wed, Mar 23, 2005 at 12:33:05PM -0800, Roland Dreier wrote:
>     Grant> *nod*. I try the above first...then cycle power and see if
>     Grant> it comes back to life. The switch has been on since
>     Grant> December or so.
>
> If you have a serial console or ethernet configured for the switch,
> you can check if it still looks happy as well.  It wouldn't really
> surprise me that much if the switch crashed sometime in the past
> couple of months.

Yup. The switch was hosed and cycling power got it back to life again:
grundler at ionize:~$ cat /sys/class/infiniband/mthca0/ports/*/state
4: ACTIVE
1: DOWN

grundler at iowa:~$ cat /sys/class/infiniband/mthca0/ports/*/state
4: ACTIVE
4: ACTIVE

Of course, ping works too.

And my apologies - I likely instigated the failure.
I connected the serial console out of the TS90 to the serial port of
an rx2600 (ia64). At the time I thought: "Just in case LAN config fails".
I didn't realize the rx2600 had a getty running.
The TS90 wasn't able to teach getty to use "help". :^(
And the getty didn't let the TS90 login either. :^(
I suppose that's an easy-to-setup stress test for the switch user
interface and serial port drivers.

BTW, interesting to note that the TS90 switch is an "embedded linux"
device running 2.4.19 kernel. Console output appended.

"PPC 440GP" makes me wonder if this switch exposed the problem
of DMA to non-cacheline-aligned buffers on non-coherent platforms.
Could just be a coicindence I guess. :^)

And I found the user interface on the switch complete non-intuitive.
e.g. The UI supports auto-completion. If a command only supports
one option (e.g. show ib), then autocompletion should supply it.
I never did figure out how to view port status from the command line.
(I don't really need to normally and certainly not when sitting
in front of the switch)

Ok...back to our regular program...

thanks,
grant


=> reset

PPCBoot 1.1.6 (Release 1.1.3hp releng #25 02/19/2004 12:05:03) (Feb 19 2004 - 1)

Board: Topspin 90 Controller Card
FPGA loading...
FPGA Revision Register=0xf0100005 Rev=0x6
POST: FPGA Rev:0x6 : PASSED
TS90 cntlr0: 1 fan, cntlr1: 2 fans
Setting fan cntlr0 to MANUAL mode
POST: ctlr0 wait for fans to speed-up...
POST: ctlr0 wait for fans to slow-down...
POST: fan cntlr0 PASSED
Setting fan cntlr1 to MANUAL mode
POST: ctlr1 wait for fans to speed-up...
POST: ctlr1 wait for fans to slow-down...
POST: fan cntlr1 PASSED
Setting fan cntlr0 to AUTO mode
Setting fan cntlr1 to AUTO mode
leds 0xf0100004 = 0xb0
Hit any key to stop autoboot:  0
Releasing Anafa 1 from reset...
Releasing Anafa 2 from reset...
Releasing Anafa 3 from reset...
ENET Speed is 100 Mbps - FULL duplex connection

Boot Regular Image from Disk Partition 0: boot sig=0xcc9e8160
Loading:      ++++++++++++++++++++++++++++++++++++++++[ 1272K]
              ++++++++++++++++++++++++++++++++++++++++[ 2552K]
              ++++++++++++++++++++++++++++++++++++++++[ 3832K]
              ++++++++++++++++++++++++++++++++++++++++[ 5112K]
              ++++++++++++++++++++++++++++++++++++++++[ 6392K]
              ++++++++++++++++++++++++++++++++++++++++[ 7672K]
              ++++++++++++++++++                      [ Done ]
## Booting image at 00200000 ...
   Verifying Checksum ... OK
   Uncompressing Multi-File Image ... OK
   Loading Ramdisk to 07a7a000, end 07eb69d4 ... OK
Linux version 2.4.19-rc3 (releng at borg(Release 1.1.3hp releng #25 02/19/2004 12:4
Topspin 90 base board
I2C TTY driver v1.1
iic_ibmocp_init: IBM on-chip iic adapter module
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem) readonly.
EXT2-fs warning: checktime reached, running e2fsck is recommended
hostname.
Mounting local filesystems...
Partition check:
 fla: fla1 fla2
 flb: flb1
EXT3-fs warning: checktime reached, running e2fsck is recommended
EXT3-fs: recovery complete.
Starting syslogd klogd.
Starting portmap.
Starting ntpd.
Starting sshd.
Starting inetd.
Starting crond.
Starting software Release-1.1.3hp/build025.
Start Controller module.
IBSM_PATH is now ../ppc440_lt
Check for needed config files and software.
Getting chassis-id, slot-id, and card-type information
Load Mellanox drivers.
MTHOME is now /topspin/images/Release-1.1.3hp/build025/exe/scripts/./../ppc440_x
Add Mellanox DLLs to run-time ld.so search path.
Start MDDK.
mosal:Loading mosal [  OK  ]
mosal:Creating /dev/mosal [  OK  ]
mdd:Loading mdd mdd device registered successfully with major 253
[  OK  ]
mdd:Creating /dev/mdd_dev [  OK  ]
mdd:    Version:

        Device name mt43132_pci0:
        =========================================================
        Device ID  = MT43132
        Bus type   = PCI        Base Address = 0xfffe0000
        FW Version = 5.1.0
        FW Build   = 0x0249


        Device name mt43132_pci1:
        =========================================================
        Device ID  = MT43132
        Bus type   = PCI        Base Address = 0xfbff0000
        FW Version = 5.1.0
        FW Build   = 0x0249


        Device name mt43132_pci2:
        =========================================================
        Device ID  = MT43132
        Bus type   = PCI        Base Address = 0xf7ff0000
        FW Version = 5.1.0
        FW Build   = 0x0249

Load lldr.o
Creating convenience symlinks to Mellanox utils.
starting to check firmware/microcode - 21:57:41
checking FPGA firmware - 21:57:41
Checking FPGA Status from PPCBoot ...
FPGA Sanity Checking ...
Sanity Check passed
Update Checking ...
FPGA rev = 6
File rev = 6
No update needed. Done.
FPGA firmware done - 21:57:42
checking Anafa microcode - 21:57:42
  AnafaInit: filename=/topspin/images/boot/exe/arch/ucode-43132-5.1.0 tmpname=/0
  AnafaInit: Anafa0 fw 5.1.0 matches
  AnafaInit: filename=/topspin/images/boot/exe/arch/ucode-43132-5.1.0 tmpname=/0
  AnafaInit: Anafa1 fw 5.1.0 matches
  AnafaInit: filename=/topspin/images/boot/exe/arch/ucode-43132-5.1.0 tmpname=/0
  AnafaInit: Anafa2 fw 5.1.0 matches
Anafa firmware 5.1.0 matches firmware file ucode-43132-5.1.0
Anafa microcode done - 21:57:43
done checking firmware/microcode - 21:57:43
card_startup.x : chassis-type=TS360, chassis-id=d9dfffffe2aaf, slot-id=1, card-x
card_startup.x : successfully started-up card in I2C mode
chassis-type is TS360, chassis-id is d9dfffffe2aaf, slot 1 is controllerIb12porx
Load ts_kernel_services.o.
Load ts_kernel_poll.o.
Load ts_ib_device_n[KERNEL_IB][_tsIbTcarqDeviceInit][tcarq_main.c:126]Created m)
o_vapi.o.
Load ts_ib_tcarq.o.
[KERNEL_IB][_tsIbTcarqDeviceInit][tcarq_main.c:126]Created mt43132_pci1 send qu)
[KERNEL_IB][_tsIbTcarqDeviceInit][tcarq_main.c:126]Created mt43132_pci2 send qu)
Load ts_ib_mad_tcarq.o.
[KERNEL_IB][tsIbMadReceiveSetup][mad_tcarq.c:268]Created mt43132_pci0 QP 0
[KERNEL_IB][tsIbMadReceiveSetup][mad_tcarq.c:287]Created mt43132_pci0 QP 1
[KERNEL_IB][tsIbMadReceiveSetup][mad_tcarq.c:268]Created mt43132_pci1 QP 0
[KERNEL_IB][tsIbMadReceiveSetup][mad_tcarq.c:287]Created mt43132_pci1 QP 1
[KERNEL_IB][tsIbMadReceiveSetup][mad_tcarq.c:268]Created mt43132_pci2 QP 0
[KERNEL_IB][tsIbMadReceiveSetup][mad_tcarq.c:287]Created mt43132_pci2 QP 1
Load ts_ib_client_query.o.
Load ts_ib_sa_client.o.
Load ts_ipoib.o.
Load ts_ib_useraccess.o.
Creating character special files ts_ua[0-6].
Configure ts0 interface.
Start serdes_cfg.x in background.
Configuring switch SERDES:
  Anafa 1
    Port 1: internal ... done
    Port 2: internal ... done
    Port 3: internal ... done
    Port 4: internal ... done
    Port 5: front ... done
    Port 6: front ... done
    Port 7: front ... done
    Port 8: front ... done
  Anafa 2
    Port 1: internal ... done
    Port 2: internal ... done
    Port 3: internal ... done
    Port 4: backplane ... done
    Port 5: front ... done
    Port 6: front ... done
    Port 7: front ... done
    Port 8: front ... done
  Anafa 3
    Port 1: internal ... done
    Port 2: internal ... done
    Port 3: internal ... done
    Port 4: backplane ... done
    Port 5: front ... done
    Port 6: front ... done
    Port 7: front ... done
    Port 8: front ... done
Start ib_port_agent.x -1 in background.
Start ts_sma.x -1 in background.
[KERNEL_IB][tsIbNodeDescSet_R5c433891][device_mellanox.c:378]*node_desc has dift
[KERNEL_IB][tsIbNodeDescSet_R5c433891][device_mellanox.c:378]*node_desc has dift
[KERNEL_IB][tsIbNodeDescSet_R5c433891][device_mellanox.c:378]*node_desc has dift
Start notifier.x in background.
Start watchd_mgr.x -1 -controllerIb12port4x in background.
Start ip_mgr.x in background.
Start fib_mgr.x in background
Start ib_mgr.x in background.
srpm_mgr.x -chassis-id 0xd9dfffffe2aaf
Start chassis_mgr.x in background
Changing password for root
Password changed.
Start port_mgr.x in background.
[INFO] : card 1 is inserted - type=controllerIb12port4x
[INFO] : card 1 is up (in-service) - type=controllerIb12port4x
Start snmp_agent.x in background





Pause to let processes finish initializing.
Starting CLI.
startup-config file missing.  Start with factory default configuration.
Login:


================================================================================
                               Backplane Seeprom
================================================================================
base-mac-addr        chassis-id
--------------------------------------------------------------------------------
0:d:9d:fe:a:af       0xd9dfffffe2aaf


================================================================================
                               Backplane Seeprom
================================================================================
product            pca                pca                fru
serial-number      serial-number      number             number
--------------------------------------------------------------------------------
USC041700011       CS041700002        95-00021-01-B3     AB291-62001

HP-IB# show card-inventory


================================================================================
                      Card Resource/Inventory Information
================================================================================
                  slot-id : 1
              used-memory : 42040 (kbytes)
              free-memory : 85784 (kbytes)
          used-disk-space : 11576 (kbytes)
          free-disk-space : 90803 (kbytes)
        last-image-source : Release-1.1.3hp/build025
     primary-image-source : Release-1.1.3hp/build025
                    image : Release-1.1.3hp/build025
                cpu-descr : PPC 440GP Rev. C - Rev 4.129 (pvr 4012 0481)
        fpga-firmware-rev : 6
          ib-firmware-rev : 5.1.0



More information about the general mailing list