[ofw] deadlock starting in __free_mads

Tzachi Dar tzachid at mellanox.co.il
Sun Jan 2 02:16:00 PST 2011


Hi All,

I have reached a deadlock caused by the fact that the function __free_mads  takes h_al->mad_lock
It then calls ib_put_mad which calls al_remove_mad that will try to take the same lock.

Please see the call stack.


Child-SP          RetAddr           Call Site
fffff880`021c9fd0 fffff800`01687a3f nt!KxWaitForSpinLockAndAcquire+0x20
fffff880`021ca000 fffff880`07cc775e nt!KeAcquireSpinLockAtDpcLevel+0x6f
fffff880`021ca050 fffff880`07d0ef5c ibbus!cl_spinlock_acquire+0x5e [b:\users\tzachid\projinf5\trunk\inc\kernel\complib\cl_spinlock_osd.h @ 96]
fffff880`021ca090 fffff880`07d30c5e ibbus!al_remove_mad+0x2c [b:\users\tzachid\projinf5\trunk\core\al\al.c @ 245]
fffff880`021ca0d0 fffff880`07d0e5ed ibbus!ib_put_mad+0x23e [b:\users\tzachid\projinf5\trunk\core\al\kernel\al_mad_pool.c @ 923]
fffff880`021ca110 fffff880`07d0e814 ibbus!__free_mads+0x8d [b:\users\tzachid\projinf5\trunk\core\al\al.c @ 143]
fffff880`021ca160 fffff880`07d63131 ibbus!free_al+0x54 [b:\users\tzachid\projinf5\trunk\core\al\al.c @ 168]
fffff880`021ca1a0 fffff880`07d61f67 ibbus!async_destroy_cb+0x7f1 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 842]
fffff880`021ca210 fffff880`07d62820 ibbus!sync_destroy_obj+0x6a7 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 704]
fffff880`021ca280 fffff880`07d61976 ibbus!destroy_obj+0x820 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 774]
fffff880`021ca2f0 fffff880`07cdbf3e ibbus!sync_destroy_obj+0xb6 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 633]
fffff880`021ca360 fffff880`07ca93ef ibbus!al_cleanup+0x3fe [b:\users\tzachid\projinf5\trunk\core\al\al_init.c @ 146]
fffff880`021ca3c0 fffff880`07cce977 ibbus!fdo_release_resources+0x7af [b:\users\tzachid\projinf5\trunk\core\bus\kernel\bus_pnp.c @ 715]
fffff880`021ca440 fffff880`07cce7ad ibbus!cl_do_remove+0x127 [b:\users\tzachid\projinf5\trunk\core\complib\kernel\cl_pnp_po.c @ 680]
fffff880`021ca480 fffff880`07cc9e20 ibbus!__remove+0x15d [b:\users\tzachid\projinf5\trunk\core\complib\kernel\cl_pnp_po.c @ 648]
fffff880`021ca4d0 fffff880`00c7093c ibbus!cl_pnp+0x1410 [b:\users\tzachid\projinf5\trunk\core\complib\kernel\cl_pnp_po.c @ 243]
fffff880`021ca5c0 fffff880`00c692ce Wdf01000!FxPkgFdo::ProcessRemoveDeviceOverload+0x74
fffff880`021ca5f0 fffff880`00c67dd6 Wdf01000!FxPkgPnp::_PnpRemoveDevice+0x126
fffff880`021ca660 fffff880`00c37245 Wdf01000!FxPkgPnp::Dispatch+0x1b2
fffff880`021ca6d0 fffff880`00c3714b Wdf01000!FxDevice::Dispatch+0xa9

It seems to me, that the best way to solve this issue (without doing revolutions in the code) is to create a new version of ib_put_mad that will be called ib_put_mad_locked that will call al_remove_mad_locked (a new function as well) that will not take the lock again.

Does anyone has objections or a better way to fix the issue?

Thanks
Tzachi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20110102/a7de11fb/attachment.html>


More information about the ofw mailing list