[openib-general] ttcp.aio - kernel NULL pointer dereference

Michael S. Tsirkin mst at mellanox.co.il
Mon Apr 18 07:50:46 PDT 2005


Hello, Libor!
Every once in a while, when I run ttcp 

I get a kernel NULL pointer dereference from SDP

I compiled the ttcp.aio test with 

gcc -I../../../linux-kernel/infiniband/ulp/sdp ttcp.aio.c -O2 -o ttcp.aio.x -laio

I run ttcp on the server as

./ttcp.aio.x -r -l 100 -a 10

and the client as

./ttcp.aio.x -t -l 100 -n 100 -a 10 11.4.8.155

I repeated this test several times, sometimes getting 
ttcp-t: Event error <-32> <5275648>
messages and sometimes not.
It was the server that finally crashed.

My kernel is 2.6.11 + latest openib svn (rev 2171).

The log file leading to the crash is below:

Apr 18 17:34:11 swlab155 kernel:  ERR: : IOCB <0> cancel <0> flag <0040> size <1:0:1>
Apr 18 17:34:22 swlab155 kernel:  ERR: : IOCB <0> cancel <0> flag <0040> size <100:0:100>
Apr 18 17:34:41 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> <1:8:8>
Apr 18 17:34:41 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> <1:8:8>
Apr 18 17:34:49 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> <1:8:8>
Apr 18 17:34:49 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> <1:8:8>
Apr 18 17:34:59 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> <1:8:8>
Apr 18 17:34:59 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> <1:8:8>
Apr 18 17:34:59 swlab155 kernel: WARN: : Unexpected conn state. conn <9> state <ff01:fd01>
Apr 18 17:35:22 swlab155 kernel:  ERR: : IOCB <0> cancel <0> flag <0040> size <100:0:100>
Apr 18 17:35:34 swlab155 last message repeated 5 times
Apr 18 17:35:44 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> <1:8:8>
Apr 18 17:35:44 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> <1:8:8>
Apr 18 17:35:52 swlab155 kernel:  ERR: : VMA lock <528000:100> error <-12> <1:8:8>
Apr 18 17:35:52 swlab155 kernel:  ERR: : VMA lock <52c000:100> error <-12> <1:8:8>
Apr 18 17:35:52 swlab155 kernel: WARN: : Cancel read with no IOCB. <2:0:00000005>
Apr 18 17:35:52 swlab155 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000038 RIP: 
Apr 18 17:35:52 swlab155 kernel: <ffffffff80389f5e>{_spin_lock_irqsave+9}
Apr 18 17:35:52 swlab155 kernel: PGD 15cb56067 PUD 15cbb4067 PMD 0 
Apr 18 17:35:52 swlab155 kernel: Oops: 0002 [1] SMP 
Apr 18 17:35:52 swlab155 kernel: CPU 0 
Apr 18 17:35:52 swlab155 kernel: Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa ib_umad ib_mthca ib_mad ib_core
Apr 18 17:35:52 swlab155 kernel: Pid: 6, comm: events/0 Not tainted 2.6.11-openib
Apr 18 17:35:52 swlab155 kernel: RIP: 0010:[_spin_lock_irqsave+9/27] <ffffffff80389f5e>{_spin_lock_irqsave+9}
Apr 18 17:35:52 swlab155 kernel: RIP: 0010:[<ffffffff80389f5e>] <ffffffff80389f5e>{_spin_lock_irqsave+9}
Apr 18 17:35:52 swlab155 kernel: RSP: 0000:ffff8100dfe9fe08  EFLAGS: 00010092
Apr 18 17:35:52 swlab155 kernel: RAX: 0000000000000064 RBX: 0000000000000000 RCX: ffff81015c596528
Apr 18 17:35:52 swlab155 kernel: RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000038
Apr 18 17:35:52 swlab155 kernel: RBP: ffff81014dd23080 R08: ffff8100dfe9e000 R09: 0000000000000000
Apr 18 17:35:52 swlab155 kernel: R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000068
Apr 18 17:35:52 swlab155 kernel: R13: 0000000000000064 R14: 0000000000000038 R15: 0000000000000000
Apr 18 17:35:52 swlab155 kernel: FS:  0000000000000000(0000) GS:ffffffff80522c80(0000) knlGS:0000000000000000
Apr 18 17:35:52 swlab155 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Apr 18 17:35:52 swlab155 kernel: CR2: 0000000000000038 CR3: 000000015c626000 CR4: 00000000000006e0
Apr 18 17:35:52 swlab155 kernel: Process events/0 (pid: 6, threadinfo ffff8100dfe9e000, task ffff8100dff02750)
Apr 18 17:35:52 swlab155 kernel: Stack: 0000000000000292 ffffffff8018663b 0000000000000286 ffff81014dec1680 
Apr 18 17:35:52 swlab155 kernel:        ffff81014dec1718 ffff8100dffa2000 ffff81014dec1680 0000000000000292 
Apr 18 17:35:52 swlab155 kernel:        ffffffff8804a8c2 ffffffff8804a956 
Apr 18 17:35:52 swlab155 kernel: Call Trace:<ffffffff8018663b>{aio_complete+129} <ffffffff8804a8c2>{:ib_sdp:do_iocb_complete+0} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff8804a956>{:ib_sdp:do_iocb_complete+148} <ffffffff80140b1f>{worker_thread+476} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff8012d10b>{default_wake_function+0} <ffffffff8012d10b>{default_wake_function+0} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff80140943>{worker_thread+0} <ffffffff80144a12>{kthread+206} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff8010dc43>{child_rip+8} <ffffffff80144944>{kthread+0} 
Apr 18 17:35:52 swlab155 kernel:        <ffffffff8010dc3b>{child_rip+0} 
Apr 18 17:35:52 swlab155 kernel: 
Apr 18 17:35:52 swlab155 kernel: Code: f0 fe 0f 0f 88 8b 01 00 00 48 8b 04 24 48 83 c4 08 c3 fa f0 
Apr 18 17:35:52 swlab155 kernel: RIP <ffffffff80389f5e>{_spin_lock_irqsave+9} RSP <ffff8100dfe9fe08>
Apr 18 17:35:52 swlab155 kernel: CR2: 0000000000000038

-- 
MST - Michael S. Tsirkin



More information about the general mailing list