[openib-general] [PATCH] mad: fix oops in cancel_mads

Michael S. Tsirkin mst at mellanox.co.il
Thu Mar 30 05:52:54 PST 2006


We have seen the following OOPs in cancel_mads, when restarting
opensm multiple times.

[17236730.836000] Call Trace:
[17236730.840000]  [<c010549b>] show_stack+0x9b/0xb0
[17236730.848000]  [<c01055ec>] show_registers+0x11c/0x190
[17236730.852000]  [<c01057cd>] die+0xed/0x160
[17236730.860000]  [<c031b966>] do_page_fault+0x3f6/0x5d0
[17236730.864000]  [<c010511f>] error_code+0x4f/0x60
[17236730.872000]  [<f8ac4e38>] cancel_mads+0x128/0x150 [ib_mad]
[17236730.876000]  [<f8ac2811>] unregister_mad_agent+0x11/0x130 [ib_mad]
[17236730.884000]  [<f8ac2a12>] ib_unregister_mad_agent+0x12/0x20 [ib_mad]
[17236730.892000]  [<f8b10f23>] ib_umad_close+0xf3/0x130 [ib_umad]
[17236730.900000]  [<c0162937>] __fput+0x187/0x1c0
[17236730.904000]  [<c01627a9>] fput+0x19/0x20
[17236730.912000]  [<c0160f7a>] filp_close+0x3a/0x60
[17236730.916000]  [<c0121ca8>] put_files_struct+0x68/0xa0
[17236730.940000]  [<c0103cf7>] do_signal+0x47/0x100
[17236730.948000]  [<c0103ded>] do_notify_resume+0x3d/0x40
[17236730.952000]  [<c0103f9e>] work_notifysig+0x13/0x25
[17236730.960000] Code: fa 03 74 0d 8d 42 d0 3c 20 19 c0 83 e0 04 83 c0 24 5d c3
89 f6 55 89 e5 57 56 53 51 89 c7 8b 40 18 89 d3 89 45 f0 8b 02 8b 70 10 <8b
[17236730.980000]  <1>Fixing recursive fault but reboot is needed!

We traced this back to local_completions unlocking mad_agent_priv->lock
while still keeping a pointer into local_list. A later call to
list_del(&local->completion_list) would then corrupt the list.

---

Remove the entry from local_list after looking it up but before
releasing mad_agent_priv->lock, to prevent cancel_mads from
finding and freeing it.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm at mellanox.co.il>

Index: src/drivers/infiniband/core/mad.c
===================================================================
@@ -2311,6 +2311,7 @@ static void local_completions(void *data
 		local = list_entry(mad_agent_priv->local_list.next,
 				   struct ib_mad_local_private,
 				   completion_list);
+		list_del(&local->completion_list);
 		spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
 		if (local->mad_priv) {
 			recv_mad_agent = local->recv_mad_agent;
@@ -2362,7 +2363,6 @@ local_send_completion:
 						   &mad_send_wc);
 
 		spin_lock_irqsave(&mad_agent_priv->lock, flags);
-		list_del(&local->completion_list);
 		atomic_dec(&mad_agent_priv->refcount);
 		if (!recv)
 			kmem_cache_free(ib_mad_cache, local->mad_priv);

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies



More information about the general mailing list