[openib-general] ipoib oops

Michael S. Tsirkin mst at mellanox.co.il
Mon Nov 14 07:42:02 PST 2005


Hello, Roland!
I am still seeing IPoIB oopsing about once a week around
ipoib_mcast_join_complete (oops below).

While looking at it, a question occured to me:
what protects the following code in ipoib_mcast_stop_thread


       list_for_each_entry(mcast, &priv->multicast_list, list) {
                if (mcast->query) {
                        ib_sa_cancel_query(mcast->query_id, mcast->query);
                        mcast->query = NULL;
                        ipoib_dbg_mcast(priv, "waiting for MGID " IPOIB_GID_FMT "\n",
                                        IPOIB_GID_ARG(mcast->mcmember.mgid));
                        wait_for_completion(&mcast->done);
                }
        }

from walking the list while an entry is being added/deleted from it?
I wander whether this could be the reason we are oopsing.

Thanks,
MST


Unable to handle kernel NULL pointer dereference at 0000000000000488 RIP:
<ffffffff88040344>{:ib_ipoib:ipoib_mcast_join_finish+100}
PGD 178b11067 PUD 174bb7067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: ib_ipoib ib_sdp ib_cm ib_sa ib_umad ib_mthca ib_mad ib_core
Pid: 2433, comm: ib_mad1 Not tainted 2.6.14 #4
RIP: 0010:[<ffffffff88040344>] <ffffffff88040344>{:ib_ipoib:ipoib_mcast_join_finish+100}
RSP: 0018:ffff81017ada7c58  EFLAGS: 00010282
RAX: 000000007a010000 RBX: 0000000000000000 RCX: 0000000000000010
RDX: ffff810177847a80 RSI: ffff810177847a80 RDI: ffff810177847a80
RBP: ffff810177847a80 R08: 0000000000000000 R09: ffff81017ada7d38
R10: ffff81017ada7df8 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000480 R14: 0000000000000000 R15: ffff81017adca898
FS:  0000000000000000(0000) GS:ffffffff805ff800(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000488 CR3: 00000001791a4000 CR4: 00000000000006e0
Process ib_mad1 (pid: 2433, threadinfo ffff81017ada6000, task ffff81017e66d780)
Stack: ffff8100064266e0 0000000000000286 ffff810171566700 0000000000000286
       ffff81017dbb9c00 0000000000000286 ffff81017be8fe10 ffff810179f45440
       ffff81017be8fe10 ffff810178282c00
Call Trace:<ffffffff880412a8>{:ib_ipoib:ipoib_mcast_join_complete+56}
       <ffffffff880021d8>{:ib_core:ib_unpack+200} <ffffffff88037c4c>{:ib_sa:ib_sa_mcmember_rec_callback+76}
       <ffffffff88037472>{:ib_sa:recv_handler+66} <ffffffff8801014d>{:ib_mad:ib_mad_completion_handler+957}
       <ffffffff8800fd90>{:ib_mad:ib_mad_completion_handler+0}
       <ffffffff80146fac>{worker_thread+476} <ffffffff80131150>{default_wake_function+0}
       <ffffffff8012df43>{__wake_up_common+67} <ffffffff80131150>{default_wake_function+0}
       <ffffffff8014b4b0>{keventd_create_kthread+0} <ffffffff80146dd0>{worker_thread+0}
       <ffffffff8014b4b0>{keventd_create_kthread+0} <ffffffff8014b609>{kthread+217}
       <ffffffff8010e7a6>{child_rip+8} <ffffffff8014b4b0>{keventd_create_kthread+0}
       <ffffffff8014b530>{kthread+0} <ffffffff8010e79e>{child_rip+0}


Code: 49 8b 7d 08 48 81 c7 cc 01 00 00 f3 a6 75 17 49 8b 45 70 8b
RIP <ffffffff88040344>{:ib_ipoib:ipoib_mcast_join_finish+100} RSP <ffff81017ada7c58>
CR2: 0000000000000488 

-- 
MST



More information about the general mailing list