[openib-general] Re: IPoIB Failure CQ overrun
Woodruff, Robert J
robert.j.woodruff at intel.com
Thu Dec 16 16:05:43 PST 2004
I am now seeing a new failure now.
I bring up 2 nodes and initially can ping between the nodes.
Then I try to run netpipe, and after the messages size gets a little
past 4K, it hangs. I see the same behavior running MPI over TCP.
This use to work.
I look in the dmesg log and see the following:
ib0: send complete, wrid 0
ib0: called: id 1, op 0, status: 0
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: send complete, wrid 1
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=389 address=00000100bff20b00 qpn=0x000404
ib_mthca 0000:04:00.0: CQ overrun on CQN 00000082
ib0: called: id 2, op 0, status: 0
ib0: send complete, wrid 2
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
ib0: sending packet, length=2048 address=00000100bff20b00 qpn=0x000404
After it gets into this state, the interface is dead.
/sbin/ip neigh show dev ib0
192.168.0.1 nud failed
Any ideas ?
woody
More information about the general
mailing list