[ofa-general] BUG() in iw_cm:iw_cm:cm_work_handler

Roland Dreier rdreier at cisco.com
Fri Mar 28 22:44:54 PDT 2008


I've seen the following crash a few times using iWARP on cxgb3.  The
crash is at the BUG() here:

	switch (cm_id_priv->state) {
	/* ... */
	default:
		BUG();
	}

I added a print to dump the state, and it was 2 (IW_CM_STATE_CONN_RECV)
both times I hit the crash after that.

the way I've been able to reproduce it is with some buggy code that
does an RDMA read into a memory region without remote write permission
(which works on IB but not iWARP).  The active side issues an RDMA
read, which fails and causes the connection to be torn down, and the
passive side crashes as below sometimes.

Anyone have any suggestions on how to track this down?

[11481.273827] ------------[ cut here ]------------
[11481.273827] kernel BUG at drivers/infiniband/core/iwcm.c:790!
[11481.273827] invalid opcode: 0000 [1] SMP
[11481.273827] CPU 3
[11481.273827] Modules linked in: rdma_ucm ib_uverbs nfs lockd nfs_acl fan ac battery ipv6 fuse dm_snapshot dm_mirror dm_mod iw_cxgb3 svcrdma xprtrdma sunrpc rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr loop ide_cd_mod cdrom ide_pci_generic piix iw_nes serio_raw ide_core cxgb3 ib_core evdev thermal pcspkr psmouse ipmi_si ipmi_msghandler ehci_hcd bnx2 zlib_inflate libcrc32c firmware_class container button uhci_hcd processor
[11481.273827] Pid: 2518, comm: iw_cm_wq Not tainted 2.6.25-rc7 #84
[11481.273827] RIP: 0010:[<ffffffff8816ae98>]  [<ffffffff8816ae98>] :iw_cm:cm_work_handler+0x36e/0x42a
[11481.273827] RSP: 0018:ffff8100798e9dc0  EFLAGS: 00010093
[11481.273827] RAX: 0000000000000002 RBX: 0000000000000246 RCX: ffffffff80622f04
[11481.273827] RDX: ffff810079520000 RSI: 0000000000000001 RDI: 0000000000000086
[11481.273828] RBP: ffff8100798e9e50 R08: ffff81007e588d08 R09: ffff81000100a030
[11481.273828] R10: 0000000000000001 R11: 00000000798e9a70 R12: ffff81007e588c00
[11481.273828] R13: 0000000000000000 R14: ffff81007e588d08 R15: ffff8100798e9de0
[11481.273828] FS:  0000000000000000(0000) GS:ffff81007fc02e00(0000) knlGS:0000000000000000
[11481.273828] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[11481.273828] CR2: 00007fed6a22e7c0 CR3: 0000000079596000 CR4: 00000000000006e0
[11481.273828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[11481.273828] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[11481.273828] Process iw_cm_wq (pid: 2518, threadinfo ffff8100798e8000, task ffff810079520000)
[11481.273828] Stack:  ffff810079dec518 ffff81007e588cb8 ffff81007e588cf8 ffff81007e588cf8
[11481.273828]  0000000000000005 0000000000000000 0000000000000000 0000000000000000
[11481.273828]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[11481.273828] Call Trace:
[11481.273828]  [<ffffffff8816ab2a>] ? :iw_cm:cm_work_handler+0x0/0x42a
[11481.273828]  [<ffffffff8023ec92>] run_workqueue+0xeb/0x200
[11481.273828]  [<ffffffff8023f847>] worker_thread+0xa4/0xb5
[11481.273828]  [<ffffffff80242517>] ? autoremove_wake_function+0x0/0x38
[11481.273828]  [<ffffffff8023f7a3>] ? worker_thread+0x0/0xb5
[11481.273828]  [<ffffffff802423f1>] kthread+0x49/0x78
[11481.273828]  [<ffffffff8020cf38>] child_rip+0xa/0x12
[11481.273828]  [<ffffffff8020c64f>] ? restore_args+0x0/0x30
[11481.273828]  [<ffffffff8024225c>] ? kthreadd+0x157/0x17c
[11481.273828]  [<ffffffff802423a8>] ? kthread+0x0/0x78
[11481.273828]  [<ffffffff8020cf2e>] ? child_rip+0x0/0x12
[11481.273828]
[11481.273828]
[11481.273828] Code: 4c 89 f7 41 c7 44 24 58 00 00 00 00 e8 38 a4 2c f8 4c 89 fe 4c 89 e7 41 ff 14 24 4c 89 f7 41 89 c5 e8 6c a3 2c f8 48 89 c3 eb 07 <0f> 0b eb fe 45 31 ed 48 89 de 4c 89 f7 e8 0c a4 2c f8 eb 04 0f
[11481.273828] RIP  [<ffffffff8816ae98>] :iw_cm:cm_work_handler+0x36e/0x42a
[11481.273828]  RSP <ffff8100798e9dc0>
[11481.282720] ---[ end trace 2667de3bacfc5e84 ]---




More information about the general mailing list