[ofa-general] hang at module removal with local sa patches applied

Sean Hefty mshefty at ichips.intel.com
Mon Jun 18 16:40:35 PDT 2007


> [14897.168277] local_sa      D 0000000000000001     0  8361      2 (L-TLB)
> [14897.168280]  ffff81007d0d3c10 0000000000000046 0000000000000000 800000ce00000000
> [14897.168283]  84000b0000000000 000000000000000a ffff81007e8f3420 ffff81007ff1f4a0
> [14897.168287]  00000d8431895ed4 0000000000000d33 ffff81007e8f35d0 800000ce00000000
> [14897.168290] Call Trace:
> [14897.168294]  [<ffffffff80582e4a>] __mutex_lock_slowpath+0x69/0xaa
> [14897.168303]  [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
> [14897.168306]  [<ffffffff80582c87>] mutex_lock+0xe/0x10
> [14897.168311]  [<ffffffff880636b6>] :ib_sa:port_work_handler+0x1c/0x34
> [14897.168314]  [<ffffffff80241669>] run_workqueue+0x85/0x10f
> [14897.168317]  [<ffffffff80241851>] flush_cpu_workqueue+0x28/0x7b
> [14897.168320]  [<ffffffff80241ad0>] flush_workqueue+0x43/0x5d
> [14897.168326]  [<ffffffff88063250>] :ib_sa:cleanup_port+0x25/0x7b
> [14897.168331]  [<ffffffff88063307>] :ib_sa:process_updates+0x61/0x336
> [14897.168335]  [<ffffffff8058212b>] thread_return+0x0/0xea
> [14897.168341]  [<ffffffff88063656>] :ib_sa:add_update+0x7a/0x83
> [14897.168347]  [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
> [14897.168352]  [<ffffffff88063695>] :ib_sa:refresh_port_db+0x36/0x3b
> [14897.168358]  [<ffffffff880636be>] :ib_sa:port_work_handler+0x24/0x34
> [14897.168361]  [<ffffffff80241669>] run_workqueue+0x85/0x10f
> [14897.168363]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
> [14897.168366]  [<ffffffff80242070>] worker_thread+0xdc/0xe7
> [14897.168368]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
> [14897.168371]  [<ffffffff80244f8b>] kthread+0x49/0x76
> [14897.168374]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
> [14897.168377]  [<ffffffff80244f42>] kthread+0x0/0x76
> [14897.168379]  [<ffffffff8020aa9e>] child_rip+0x0/0x12

Reading through the code, I see two potential issues:

* It's possible for flush_workqueue to be called from the workqueue thread.

* We hold a mutex when calling flush_workqueue, and a queued work item 
will try to acquire that same mutex.

I'll need to spend some time studying the thread synchronization to fix 
this.

- Sean



More information about the general mailing list