[openib-general] Re: IPoIB Failure CQ overrun

Woodruff, Robert J robert.j.woodruff at intel.com
Thu Dec 16 16:05:43 PST 2004


I am now seeing a new failure now.

I bring up 2 nodes and initially can ping between the nodes.
Then I try to run netpipe, and after the messages size gets a little
past 4K, it hangs. I see the same behavior running MPI over TCP. 
This use to work. 

I look in the dmesg log and see the following:


ib0: send complete, wrid 0
ib0: called: id 1, op 0, status: 0
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: send complete, wrid 1
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=389 address=00000100bff20b00 qpn=0x000404
ib_mthca 0000:04:00.0: CQ overrun on CQN 00000082
ib0: called: id 2, op 0, status: 0
ib0: send complete, wrid 2
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404

After it gets into this state, the interface is dead. 
/sbin/ip neigh show dev ib0
192.168.0.1 nud failed

Any ideas ?

woody





More information about the general mailing list