[ofa-general] IPoIB CQ overrun

Roland Dreier rdreier at cisco.com
Mon Dec 3 21:03:20 PST 2007


 > I dumped out the CQNs as they were created and generally the first 
 > non-reserved CQs get made by ipoib_transport_dev_init() when ipoib 
 > is brought up on each port. CQN 0x80 is used by port 0, 0x81 by 
 > port 1. 

Actually I think the first two CQs created are created by the MAD module:

 > Dec  2 10:19:23 r6i1n8 kernel: ib_mthca 0000:06:00.0: CQ overrun on CQN 000080
 > Dec  2 10:19:23 r6i1n8 kernel: ib_mad: Fatal error (1) on MAD QP (1)

It seems that there is a CQ error and then ib_mad gets a catastrophic
error on its QP.

Given that you are seeing CQ overruns on two completely different
types of QPs, I think its more likely there is some problem with the
mthca driver's handling of updating the CQ consumer index than that
there are two independent bugs being triggered by your test.

What kind of hardware was this on again?  It's x86-64, right?  But is
there anything out of the ordinary about these systems?

 - R.



More information about the general mailing list