[ofa-general] hang at module removal with local sa patches applied
Sean Hefty
mshefty at ichips.intel.com
Mon Jun 18 16:40:35 PDT 2007
> [14897.168277] local_sa D 0000000000000001 0 8361 2 (L-TLB)
> [14897.168280] ffff81007d0d3c10 0000000000000046 0000000000000000 800000ce00000000
> [14897.168283] 84000b0000000000 000000000000000a ffff81007e8f3420 ffff81007ff1f4a0
> [14897.168287] 00000d8431895ed4 0000000000000d33 ffff81007e8f35d0 800000ce00000000
> [14897.168290] Call Trace:
> [14897.168294] [<ffffffff80582e4a>] __mutex_lock_slowpath+0x69/0xaa
> [14897.168303] [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
> [14897.168306] [<ffffffff80582c87>] mutex_lock+0xe/0x10
> [14897.168311] [<ffffffff880636b6>] :ib_sa:port_work_handler+0x1c/0x34
> [14897.168314] [<ffffffff80241669>] run_workqueue+0x85/0x10f
> [14897.168317] [<ffffffff80241851>] flush_cpu_workqueue+0x28/0x7b
> [14897.168320] [<ffffffff80241ad0>] flush_workqueue+0x43/0x5d
> [14897.168326] [<ffffffff88063250>] :ib_sa:cleanup_port+0x25/0x7b
> [14897.168331] [<ffffffff88063307>] :ib_sa:process_updates+0x61/0x336
> [14897.168335] [<ffffffff8058212b>] thread_return+0x0/0xea
> [14897.168341] [<ffffffff88063656>] :ib_sa:add_update+0x7a/0x83
> [14897.168347] [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
> [14897.168352] [<ffffffff88063695>] :ib_sa:refresh_port_db+0x36/0x3b
> [14897.168358] [<ffffffff880636be>] :ib_sa:port_work_handler+0x24/0x34
> [14897.168361] [<ffffffff80241669>] run_workqueue+0x85/0x10f
> [14897.168363] [<ffffffff80241f94>] worker_thread+0x0/0xe7
> [14897.168366] [<ffffffff80242070>] worker_thread+0xdc/0xe7
> [14897.168368] [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
> [14897.168371] [<ffffffff80244f8b>] kthread+0x49/0x76
> [14897.168374] [<ffffffff8020aaa8>] child_rip+0xa/0x12
> [14897.168377] [<ffffffff80244f42>] kthread+0x0/0x76
> [14897.168379] [<ffffffff8020aa9e>] child_rip+0x0/0x12
Reading through the code, I see two potential issues:
* It's possible for flush_workqueue to be called from the workqueue thread.
* We hold a mutex when calling flush_workqueue, and a queued work item
will try to acquire that same mutex.
I'll need to spend some time studying the thread synchronization to fix
this.
- Sean
More information about the general
mailing list