[openib-general] [rdma_ucm] enabling the rdma_ucm and restarting the driver several times causes kernel oops

Dotan Barak dotanb at dev.mellanox.co.il
Thu Dec 28 00:35:38 PST 2006


Hi Sean.

When i enabled the rdma_ucm (on the trunk driver) and restarted the
driver several times (using openibd restart) i got kernel oops.
Here is more info on this issue:

*************************************************************
Host Architecture : x86_64
Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Kernel Version    : 2.6.19-smp
GCC Version       : gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)
Memory size       : 4041240 kB
Driver Version    : gen2_devel-20061226-1730
HCA ID(s)         : mthca0
HCA model(s)      : 25218
FW version(s)     : 5.1.940
Board(s)          : MT_0150000001
*************************************************************

here is the backtrace from the /var/log/messages:
Dec 27 15:36:25 sw086 kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000001 RIP:
Dec 27 15:36:25 sw086 kernel:  [<0000000000000001>]
Dec 27 15:36:25 sw086 kernel: PGD 11f4c3067 PUD 11fed7067 PMD 0
Dec 27 15:36:25 sw086 kernel: Oops: 0000 [1] SMP
Dec 27 15:36:25 sw086 kernel: CPU 1
Dec 27 15:36:25 sw086 kernel: Modules linked in: rdma_ucm ib_sdp rdma_cm
iw_cm ib_addr ib_ipoib ib_mthca ib_umad ib_ucm ib_u
verbs ib_cm ib_sa ib_mad ib_core nfsd exportfs ipv6 parport_pc lp
parport autofs4 nfs lockd nfs_acl sunrpc dm_mirror dm_mod
button battery asus_acpi ac uhci_hcd ehci_hcd i2c_i801 i2c_core tg3 sg
ext3 jbd sd_mod
Dec 27 15:36:25 sw086 kernel: Pid: 11363, comm: udev Not tainted
2.6.19-smp #1
Dec 27 15:36:25 sw086 kernel: RIP: 0010:[<0000000000000001>]
[<0000000000000001>]
Dec 27 15:36:25 sw086 kernel: RSP: 0018:ffff81012017dec0  EFLAGS: 00010282
Dec 27 15:36:25 sw086 kernel: RAX: 0000000000000002 RBX:
ffff810116af9f50 RCX: 0000000000000000
Dec 27 15:36:25 sw086 kernel: RDX: ffffffff80364eea RSI:
00000000ffffffff RDI: ffff81011fdf0a01
Dec 27 15:36:25 sw086 kernel: RBP: ffff81011bf49740 R08:
00000000fffffffb R09: 0000000000000000
Dec 27 15:36:25 sw086 kernel: R10: 0000000000000000 R11:
0000000000000000 R12: ffff81011fdf0a10
Dec 27 15:36:25 sw086 kernel: R13: ffff81012017df50 R14:
ffffffff80507f10 R15: ffffffff8826ada0
Dec 27 15:36:25 sw086 kernel: FS:  00002b1118f8cde0(0000)
GS:ffff810123477c40(0000) knlGS:0000000000000000
Dec 27 15:36:25 sw086 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Dec 27 15:36:25 sw086 kernel: CR2: 0000000000000001 CR3:
0000000120893000 CR4: 00000000000006e0
Dec 27 15:36:25 sw086 kernel: Process udev (pid: 11363, threadinfo
ffff81012017c000, task ffff8101168e7140)
Dec 27 15:36:25 sw086 kernel: Stack:  ffffffff802b5c7a 0000000000000101
0000000000001000 000000000064a970
Dec 27 15:36:25 sw086 kernel:  0000000000001000 0000000000000000
ffff81011b983800 000000000064a970
Dec 27 15:36:25 sw086 kernel:  ffff81012017df50 0000000000615a80
ffffffff80275727 ffff81011b983800
Dec 27 15:36:25 sw086 kernel: Call Trace:
Dec 27 15:36:25 sw086 kernel:  [<ffffffff802b5c7a>]
sysfs_read_file+0xaf/0x142
Dec 27 15:36:25 sw086 kernel:  [<ffffffff80275727>] vfs_read+0xd1/0x172
Dec 27 15:36:25 sw086 kernel:  [<ffffffff80275a8d>] sys_read+0x45/0x6e
Dec 27 15:36:25 sw086 kernel:  [<ffffffff8020951e>] system_call+0x7e/0x83
Dec 27 15:36:25 sw086 kernel:
Dec 27 15:36:25 sw086 kernel:
Dec 27 15:36:25 sw086 kernel: Code:  Bad RIP value.
Dec 27 15:36:25 sw086 kernel: RIP  [<0000000000000001>]
Dec 27 15:36:25 sw086 kernel:  RSP <ffff81012017dec0>
Dec 27 15:36:25 sw086 kernel: CR2: 0000000000000001


thanks
Dotan





More information about the general mailing list