[openib-general] Re: usermode hang in mthca_cq_clean

Roland Dreier rolandd at cisco.com
Wed Nov 9 12:12:03 PST 2005


    Sean> I'm seeing an issue trying to recover from an error in
    Sean> userspace.  Basically, I allocate a PD, a CQ, and a QP, then
    Sean> destroy the QP because of an unrelated error.  The destroy
    Sean> call takes several seconds to complete, and appears to be
    Sean> hung in mthca_cq_clean: line 551.  Stepping through the
    Sean> while loop there, I'm not falling into the if or else if
    Sean> cases.  The call does eventually complete.

I think I see the problem.  Does this patch fix it for you?
(basically you're doing a benchmark seeing how fast your CPU can go
through the loop 4 billion times ;)

 - R.

--- libmthca/src/cq.c	(revision 3989)
+++ libmthca/src/cq.c	(working copy)
@@ -524,7 +524,7 @@ void mthca_arbel_cq_event(struct ibv_cq 
 void mthca_cq_clean(struct mthca_cq *cq, uint32_t qpn, struct mthca_srq *srq)
 {
 	struct mthca_cqe *cqe;
-	int prod_index;
+	uint32_t prod_index;
 	int nfreed = 0;
 
 	pthread_spin_lock(&cq->lock);
@@ -546,7 +546,7 @@ void mthca_cq_clean(struct mthca_cq *cq,
 	 * Now sweep backwards through the CQ, removing CQ entries
 	 * that match our QP by copying older entries on top of them.
 	 */
-	while (--prod_index > cq->cons_index) {
+	while ((int) --prod_index - (int) cq->cons_index >= 0) {
 		cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe);
 		if (cqe->my_qpn == htonl(qpn)) {
 			if (srq)



More information about the general mailing list