[PATCH] Re: [openib-general] Re: IPoIB Failure CQ overrun

Roland Dreier roland at topspin.com
Mon Dec 20 08:06:24 PST 2004


    Michael> CQ consumer index doorbell FW is reasonably well tested
    Michael> with VAPI (and with directed tests). It is also
    Michael> relatively straight-forward code so I would suspect a
    Michael> driver problem first of all.

Fair enough but the behavior changed from FW version 3.2 to 3.3.1
which is interesting as well.

    Michael> Unfortunately once the overrun happends I can not bring
    Michael> the interface down nor unload the ip over ib module (both
    Michael> commands hang) so I have to reboot. This is slowing me
    Michael> down considerably.  Do you have an idea why is that, and
    Michael> how to fix this problem?

Probably IPoIB is stuck in the loop

	/* Wait for all sends and receives to complete */
	while (priv->tx_head != priv->tx_tail || recvs_pending(dev))
		yield();

in ipoib_ib_dev_stop(), since some of completions it's waiting for are
lost because of the CQ overrun.  I'll add a timeout here where we give
up and assume everything is done.

 - R.




More information about the general mailing list