[openib-general] CMA oops

Michael S. Tsirkin mst at mellanox.co.il
Mon Aug 28 03:57:55 PDT 2006


I've observed the following oops with CMA

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [<ffffffff88081115>] :rdma_cm:cma_detach_from_dev+0x1a/0x58
PGD 135abd067 PUD 133ed3067 PMD 0
Oops: 0002 [1] SMP
CPU 1
Modules linked in: ib_sdp rdma_cm ib_addr i2c_dev i2c_core ib_ipoib ib_mthca
ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core
Pid: 6389, comm: sdp Not tainted 2.6.18-rc2-devel #7
RIP: 0010:[<ffffffff88081115>]  [<ffffffff88081115>]
:rdma_cm:cma_detach_from_dev+0x1a/0x58
RSP: 0018:ffff8101351cbdf8  EFLAGS: 00010246
RAX: ffff810134fd3ef0 RBX: ffff810137202200 RCX: ffff8101372022f0
RDX: 0000000000000000 RSI: ffff81013acdf510 RDI: ffff810137202200
RBP: ffff810137202200 R08: ffff8101351ca000 R09: ffff810137d2cc80
R10: 0000000000000068 R11: ffff810138604540 R12: ffff810138604e40
R13: 0000000000000293 R14: ffff810135098800 R15: ffffffff8808910e
FS:  0000000000000000(0000) GS:ffff81013b876cc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000013520f000 CR4: 00000000000006e0
Process sdp (pid: 6389, threadinfo ffff8101351ca000, task ffff81013acdf510)
Stack:  ffff810137202200 ffffffff88081717 ffff810138604e88 ffff810135098800
 ffff810137202200 ffffffff880884bb ffff8101340fd800 ffff810135098800
 ffffffff88092500 ffffffff80495579 ffff8101340fd800 ffff810135098bc8
Call Trace:
 [<ffffffff88081717>] :rdma_cm:rdma_destroy_id+0x5f/0x107

Apparently, list->prev pointer in CMA id_priv structure is NULL
which causes a crash in list_del.

I note that rdma_destroy_id tests outside the mutex lock.
Could that be the problem?
The problem is not unfortunately easily reproducible.

-- 
MST




More information about the general mailing list