[PATCH] Re: [openib-general] Re: IPoIB Failure CQ overrun

Michael S. Tsirkin mst at mellanox.co.il
Mon Dec 20 07:29:24 PST 2004


Hello,  Roland!

Quoting r. Roland Dreier (roland at topspin.com) "Re: [openib-general] Re: IPoIB Failure CQ overrun":
>     Robert> 0000:04:00.0: CQ overrun on CQN 00000082
> 
> This appears to be an issue with the latest FW (I see it with Tavor FW
> 3.3.1 but not 3.2.0).  I am working with Mellanox on finding out
> whether it's a FW bug or a problem with mthca.
> 
> For now you can work around it by changing
> 
> 	IPOIB_NUM_WC 		  = 4,
> 
> to
> 
> 	IPOIB_NUM_WC 		  = 1,
> 
> in ipoib.h.
> 
>  - Roland


In investigating this issue I discovered what I belive is a race
condition in mthca:

mthca_poll_one decrements the qp->cur counter.
This will make it possible, once mthca_poll_one returns,
to post new wqes on the qp. The qp lock is then dropped, and finally
the cq consumer index is incremented.

However, if you try to post send wqes after qp->cur was decremented
but before the consumer index is updated, post will seccedd, and
cq will overrun.

A simplest solution is to update the cq consumer
index before qp is unlocked.  A patch is attached.

I also would like to suggest implementing CQ doorbell coalescing
in mthca, to reduce the number of CQ doorbells.

Unfortunately this patch does not seem to solve the overrun problem,
so may be another problem. That will need more looking into.

Thanks,
MST
-------------- next part --------------
Index: hw/mthca/mthca_cq.c
===================================================================
--- hw/mthca/mthca_cq.c	(revision 1362)
+++ hw/mthca/mthca_cq.c	(working copy)
@@ -412,6 +412,11 @@
 
 	if (!*cur_qp || be32_to_cpu(cqe->my_qpn) != (*cur_qp)->qpn) {
 		if (*cur_qp) {
+			if (*freed) {
+				wmb();
+				inc_cons_index(dev, cq, *freed);
+				*freed = 0;
+			}
 			spin_unlock(&(*cur_qp)->lock);
 			if (atomic_dec_and_test(&(*cur_qp)->refcount))
 				wake_up(&(*cur_qp)->wait);
@@ -529,16 +534,17 @@
 			break;
 	}
 
+	if (freed) {
+		wmb();
+		inc_cons_index(dev, cq, freed);
+	}
+
 	if (qp) {
 		spin_unlock(&qp->lock);
 		if (atomic_dec_and_test(&qp->refcount))
 			wake_up(&qp->wait);
 	}
 
-	if (freed) {
-		wmb();
-		inc_cons_index(dev, cq, freed);
-	}
 
 	spin_unlock_irqrestore(&cq->lock, flags);
 


More information about the general mailing list