[openib-general] [libmthca] deadlock while trying to destroy QP

Roland Dreier rdreier at cisco.com
Tue Feb 6 11:58:38 PST 2007


 > #0  0x0000003a6ce09172 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
 > #1  0x0000002a959cf449 in mthca_cq_clean (cq=0x607240, qpn=3277830, srq=0x0) at src/cq.c:554
 > #2  0x0000002a959d28b9 in mthca_destroy_qp (qp=0x607400) at src/mthca.h:246
 > #3  0x000000000040117b in client_sig_handler ()
 > #4  <signal handler called>
 > #5  0x0000003a6ce09165 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
 > #6  0x0000002a959cec91 in mthca_poll_cq (ibcq=0x607240, ne=1, wc=0x7fbffff590) at src/cq.c:467
 > #7  0x0000002a9557bf73 in ibv_poll_cq (cq=0x607240, num_entries=1, wc=0x7fbffff590) at /usr/local/ofed/include/infiniband/verbs.h:824

I guess my first reaction is "don't do that."  Trying to do something
as complex as destroying a QP from a signal handler seems very fragile
to me, and I wouldn't consider ibv_destroy_qp() safe to call from a
signal handler.

Can you just have your signal handler set a flag instead, and check
the flag from the normal flow of your program?

 > Does destroy_qp needs to be dependent on the CQ?

Yes, it needs to lock the CQ to get rid of stale completions for the
QP being destroyed.

 - R.




More information about the general mailing list