[openib-general] core and ipoib questions and oops
Michael S. Tsirkin
mst at mellanox.co.il
Mon Sep 26 05:49:43 PDT 2005
Two questions:
1. Roland, looking at ipoib_multicast, I see
if (mcast->query) {
ib_sa_cancel_query(mcast->query_id, mcast->query);
mcast->query = NULL;
ipoib_dbg_mcast(priv, "waiting for MGID " IPOIB_GID_FMT "\n",
IPOIB_GID_ARG(mcast->mcmember.mgid));
wait_for_completion(&mcast->done);
}
what prevents ipoib_mcast_join_complete from running
at the same time and changing mcast->query after we've tested it?
2. All, what happends in the core if I call ib_sa_cancel_query
while the completion is running, or has already run?
Is it possible that there's a bug that makes it possible for
a completion callback to run twice in this case?
Thanks,
MST
---
The following oops happends on svn rev 3535.
#ifconfig ib0 down
Unable to handle kernel NULL pointer dereference at 0000000000000388 RIP:
<ffffffff88045204>{:ib_ipoib:ipoib_mcast_join_finish+100}
PGD 172cd4067 PUD 172d16067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa ib_umad ib_mthca ib_mad ib_core
Pid: 2399, comm: ib_mad1 Not tainted 2.6.13
RIP: 0010:[<ffffffff88045204>] <ffffffff88045204>{:ib_ipoib:ipoib_mcast_join_finish+100}
RSP: 0018:ffff81017348dc58 EFLAGS: 00010282
RAX: 0000000074010000 RBX: 0000000000000000 RCX: 0000000000000010
RDX: ffff810177d93380 RSI: ffff810177d93380 RDI: ffff810177d93380
RBP: ffff810177d93380 R08: 0000000000000000 R09: ffff81017348dd38
R10: ffff81017348ddf8 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000380 R14: 0000000000000000 R15: ffff810173484898
FS: 0000000000000000(0000) GS:ffffffff8064b800(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000388 CR3: 0000000174725000 CR4: 00000000000006e0
Process ib_mad1 (pid: 2399, threadinfo ffff81017348c000, task ffff8101773e07f0)
Stack: 0000000100000092 0000000000000000 0000000000000096 0000000000000296
0000000000000296 ffffffff8028b8b0 0000000000000096 ffff81017d1343c0
ffff810172fd50c0 ffff810172daba10
Call Trace:<ffffffff8028b8b0>{dma_pool_free+272} <ffffffff88045cbb>{:ib_ipoib:ipoib_mcast_join_complete+43}
<ffffffff880021b5>{:ib_core:ib_unpack+198} <ffffffff8803bcc6>{:ib_sa:ib_sa_mcmember_rec_callback+64}
<ffffffff8803b49e>{:ib_sa:recv_handler+117} <ffffffff88010e83>{:ib_mad:ib_mad_completion_handler+949}
<ffffffff88010ace>{:ib_mad:ib_mad_completion_handler+0}
<ffffffff80143489>{worker_thread+478} <ffffffff8012f7da>{default_wake_function+0}
<ffffffff8012cff1>{__wake_up_common+64} <ffffffff8012f7da>{default_wake_function+0}
<ffffffff801473a8>{keventd_create_kthread+0} <ffffffff801432ab>{worker_thread+0}
<ffffffff801473a8>{keventd_create_kthread+0} <ffffffff801474d9>{kthread+204}
<ffffffff8010e352>{child_rip+8} <ffffffff801473a8>{keventd_create_kthread+0}
<ffffffff8014740d>{kthread+0} <ffffffff8010e34a>{child_rip+0}
Code: 49 8b 7d 08 48 81 c7 b4 00 00 00 f3 a6 75 17 49 8b 45 70 8b
RIP <ffffffff88045204>{:ib_ipoib:ipoib_mcast_join_finish+100} RSP <ffff81017348dc58>
CR2: 0000000000000388
Seems to oops at 0xda4 here:
0000000000000d40 <ipoib_mcast_join_finish>:
ipoib_mcast_join_finish():
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:215
d40: 41 56 push %r14
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:223
d42: b9 10 00 00 00 mov $0x10,%ecx
d47: fc cld
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:215
d48: 41 55 push %r13
d4a: 41 54 push %r12
d4c: 55 push %rbp
d4d: 48 89 fd mov %rdi,%rbp
d50: 53 push %rbx
d51: 48 83 ec 60 sub $0x60,%rsp
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:220
d55: 48 8b 06 mov (%rsi),%rax
Seems most likely that dev is NULL in the following:
static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
struct ib_sa_mcmember_rec *mcmember)
{
struct net_device *dev = mcast->dev;
struct ipoib_dev_priv *priv = netdev_priv(dev);
--
MST
More information about the general
mailing list