[PATCH] Re: [openib-general] Re: IPoIB Failure CQ overrun
Roland Dreier
roland at topspin.com
Mon Dec 20 08:06:24 PST 2004
Michael> CQ consumer index doorbell FW is reasonably well tested
Michael> with VAPI (and with directed tests). It is also
Michael> relatively straight-forward code so I would suspect a
Michael> driver problem first of all.
Fair enough but the behavior changed from FW version 3.2 to 3.3.1
which is interesting as well.
Michael> Unfortunately once the overrun happends I can not bring
Michael> the interface down nor unload the ip over ib module (both
Michael> commands hang) so I have to reboot. This is slowing me
Michael> down considerably. Do you have an idea why is that, and
Michael> how to fix this problem?
Probably IPoIB is stuck in the loop
/* Wait for all sends and receives to complete */
while (priv->tx_head != priv->tx_tail || recvs_pending(dev))
yield();
in ipoib_ib_dev_stop(), since some of completions it's waiting for are
lost because of the CQ overrun. I'll add a timeout here where we give
up and assume everything is done.
- R.
More information about the general
mailing list