[openib-general] Mellanox device in INIT state

Grant Grundler iod00d at hp.com
Tue Sep 13 22:00:03 PDT 2005


On Tue, Sep 13, 2005 at 04:32:13PM -0700, Shirley Ma wrote:
> But during the test (rm and ins ib_mthca modules), I hit another problem. 
> The stack was SVN 3380.
> 
> Sep 13 15:21:29 elm3b37 kernel: unregister_netdevice: waiting for ib0 to 
> become free. Usage count = 1

I've also got a problem on ia64 though it's clearly related to sdp_init().
It's possible yours is caused by the same issue. SDP causes modprobe
to segfault in sdp_init and I suspect leaves a reference count on 
ib_mthca modules.

I've not had a chance to look into this...I'll look tomorrow
if it's not obvious to someone else.

oh - this was with SVN r3391.

thanks,
grant


gsyprf3:~# reload_ib 
+ IPoIB=51
+ ifconfig ib0 down
ib0: ERROR while getting interface flags: No such device
+ ifconfig ib1 down
ib1: ERROR while getting interface flags: No such device
+ rmmod ib_ipoib ib_uverbs ib_sdp ib_cm ib_sa ib_mthca ib_mad ib_core
ERROR: Module ib_ipoib does not exist in /proc/modules
ERROR: Module ib_uverbs does not exist in /proc/modules
ERROR: Module ib_sdp does not exist in /proc/modules
ERROR: Module ib_cm does not exist in /proc/modules
ERROR: Module ib_sa does not exist in /proc/modules
ERROR: Module ib_mthca does not exist in /proc/modules
ERROR: Module ib_mad does not exist in /proc/modules
ERROR: Module ib_core does not exist in /proc/modules
+ modprobe ib_mthca msi_x=1
ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005)
ib_mthca: Initializing  ((¥)
GSI 60 (level, low) -> CPU 0 (0x0000) vector 69
ACPI: PCI Interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 69
 (¥: Missing DCS, aborting.
ACPI: PCI interrupt for device 0000:81:00.0 disabled
GSI 60 (level, low) -> CPU 0 (0x0000) vector 69 unregistered
+ modprobe ib_ipoib
+ modprobe ib_sdp
kmem_cache_create: Early error in slab request_sock_
kernel BUG at mm/slab.c:1220!
modprobe[1947]: bugcheck! 0 [1]
Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa ib_mthca ib_mad ib_core qla2300 qla2xxx firmware_class scsi_transport_fc e1000 tg3 e100 dm_mod

Pid: 1947, CPU 1, comm:             modprobe
psr : 00001010085a6010 ifs : 8000000000000a1a ip  : [<a00000010010e700>]    Not tainted
ip is at kmem_cache_create+0x1c0/0x1000
unat: 0000000000000000 pfs : 0000000000000a1a rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000000159959
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010010e700 b6  : a0000001000d7c40 b7  : a00000010000b130
f6  : 1003e00000000000000a0 f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000004e2 f9  : 1003e000000000fa00000
f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db
r1  : a000000100d23ce0 r2  : 0000000000000000 r3  : 0000000000000000
r8  : 0000000000000021 r9  : 00000000000000fd r10 : a000000100b3ad40
r11 : 0000000000000000 r12 : e000004042c57e20 r13 : e000004042c50000
r14 : 0000000000004000 r15 : a000000100967f68 r16 : a000000100967f70
r17 : e00000003e3f7e18 r18 : 0000000000000000 r19 : 0000000000000000
r20 : a000000100b43a48 r21 : a000000100b43a48 r22 : 0000000000000000
r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000004
r26 : a0000001008f4d50 r27 : 0000000000000001 r28 : e000004042c50d54
r29 : a0000001008f4d54 r30 : 0000000000000000 r31 : 0000000000000000

Call Trace:
 [<a000000100012840>] show_stack+0x80/0xa0
                                sp=e000004042c579c0 bsp=e000004042c51128
 [<a000000100013160>] show_regs+0x900/0x940
                                sp=e000004042c57b9/usr/local/bin/r0 bseloadp_ib:04ine 9=e:  1947 Segmenta00tion fault      00modprobe ib_sdp
  + modprobe ib_u04verbs
FATAL: Mo2cdule ib_uverbs n10dot found.
+ ifc0
.51 netmask 255. [<a0000001000398b0>] die+0x150/0x200
                                sp=e000004042c57ba0 bsp=e000004042c51088
 [<a0000001000399b0>] die_if_kernel+0x50/0x80
                                sp=e000004042c57ba0 bsp=e000004042c51058
 [<a00000010003b530>] ia64_bad_break+0x530/0x900
                                sp=e000004042c57ba0 bsp=e000004042c51030
 [<a00000010000b8a0>] ia64_leave_kernel+0x0/0x280
                                sp=e000004042c57c50 bsp=e000004042c51030
 [<a00000010010e700>] kmem_cache_create+0x1c0/0x1000
                                sp=e000004042c57e20 bsp=e000004042c50f58
 [<a00000010063e360>] proto_register+0x140/0x2a0
                                sp=e000004042c57e30 bsp=e000004042c50f10
 [<a000000200138030>] sdp_init+0x30/0x830 [ib_sdp]
                                sp=e000004042c57e30 bsp=e000004042c50ee8
 [<a0000001000e7ca0>] sys_init_module+0x2e0/0x680
                                sp=e000004042c57e30 bsp=e000004042c50e60
 [<a00000010000b700>] ia64_ret_from_syscall+0x0/0x20
                                sp=e000004042c57e30 bsp=e000004042c50e60
 [<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
                                sp=e000004042c58000 bsp=e000004042c50e60
 



More information about the general mailing list