[ofa-general] Application blocked in mthca_poll_cq

Roland Dreier rdreier at cisco.com
Mon Nov 5 08:49:26 PST 2007


 > Every now and then I notice that my application is blocks inside
 > mthca_poll_cq. When I attach gdb to the process I find its blocking on a
 > call to pthread_spin_lock/pthread_spin_unlock. I am not sure if this is
 > a bug or something wrong with what I am doing. I calling ibv_poll_cq
 > with the number of entries as 1. Any help on this would be much
 > appreciated. I am not able to replicate it on separate test program.
 > There is not other call to ibv_poll_cq.

What version of libmthca are you using?  libmthca 1.0.2 and earlier
had a bug that could cause this in rare circumstances (if you destroy
two QPs simultaneously from different threads and the two QPs are such
that the receive CQ of one QP is the send CQ of the other and vice
versa).  To be honest I doubt you're hitting this.

The only operations in libmthca that hit the CQ spinlock are:
 - polling the CQ
 - resizing a CQ
 - modifying a QP to RESET
 - destroying a QP
all of that code seems to take and release the CQ spinlock properly.

I assume your application is multithreaded?  When it gets stuck it
would be useful to know which other thread is holding the CQ lock that
poll_cq is blocked on; I don't know of a really good way to figure
that out though.

Is it possible that you have a use-after-free where you destroy a CQ
and then call poll with a pointer to the freed CQ?

 - R.



More information about the general mailing list