[openib-general] [libmthca] deadlock while trying to destroy QP
Roland Dreier
rdreier at cisco.com
Tue Feb 6 11:58:38 PST 2007
> #0 0x0000003a6ce09172 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
> #1 0x0000002a959cf449 in mthca_cq_clean (cq=0x607240, qpn=3277830, srq=0x0) at src/cq.c:554
> #2 0x0000002a959d28b9 in mthca_destroy_qp (qp=0x607400) at src/mthca.h:246
> #3 0x000000000040117b in client_sig_handler ()
> #4 <signal handler called>
> #5 0x0000003a6ce09165 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
> #6 0x0000002a959cec91 in mthca_poll_cq (ibcq=0x607240, ne=1, wc=0x7fbffff590) at src/cq.c:467
> #7 0x0000002a9557bf73 in ibv_poll_cq (cq=0x607240, num_entries=1, wc=0x7fbffff590) at /usr/local/ofed/include/infiniband/verbs.h:824
I guess my first reaction is "don't do that." Trying to do something
as complex as destroying a QP from a signal handler seems very fragile
to me, and I wouldn't consider ibv_destroy_qp() safe to call from a
signal handler.
Can you just have your signal handler set a flag instead, and check
the flag from the normal flow of your program?
> Does destroy_qp needs to be dependent on the CQ?
Yes, it needs to lock the CQ to get rid of stale completions for the
QP being destroyed.
- R.
More information about the general
mailing list