[ewg] Possible process deadlock in RMPP flow

Eli Cohen eli at dev.mellanox.co.il
Wed Sep 23 08:04:54 PDT 2009


Hi Sean,
one of our customers experiences problems when running ibnetdiscover.
The problem happens from time to time.
Here is the call stack the he gets:

ibnetdiscover D ffffffff80149b8d     0 26968  26544
(L-TLB)
 ffff8102c900bd88 0000000000000046 ffff81037e8e0000 ffff81037e8e02e8
 ffff8102c900bd78 000000000000000a ffff8102c5b50820 ffff81038a929820
 0000011837bf6105 0000000000000ede ffff8102c5b50a08 0000000100000000
Call Trace:
 [<ffffffff80064207>] wait_for_completion+0x79/0xa2
 [<ffffffff8008b4cc>] default_wake_function+0x0/0xe
 [<ffffffff882271d9>] :ib_mad:ib_cancel_rmpp_recvs+0x87/0xde
 [<ffffffff88224485>] :ib_mad:ib_unregister_mad_agent+0x30d/0x424
 [<ffffffff883983e9>] :ib_umad:ib_umad_close+0x9d/0xd6
 [<ffffffff80012e22>] __fput+0xae/0x198
 [<ffffffff80023de6>] filp_close+0x5c/0x64
 [<ffffffff800393df>] put_files_struct+0x63/0xae
 [<ffffffff80015b26>] do_exit+0x31c/0x911
 [<ffffffff8004971a>] cpuset_exit+0x0/0x6c
 [<ffffffff8005e116>] system_call+0x7e/0x83

>From the dump it seems that the process is waits on the call to
flush_workqueue() in ib_cancel_rmpp_recvs(). The package they use is
OFED 1.4.2.

Do you have any idea or suggestions how to sort this out?

Thanks.



More information about the ewg mailing list