[ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
eli at dev.mellanox.co.il
Wed Apr 30 09:06:26 PDT 2008
On Tue, 2008-04-29 at 14:49 -0700, Roland Dreier wrote:
> By the way, this isn't just theoretical -- I'm not smart enough to
> realize this except that I just saw:
> ib1: TX ring full, stopping kernel net queue
> NETDEV WATCHDOG: ib1: transmit timed out
> ib1: transmit timeout: latency 1240 msecs
> ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255
> and of course it never recovers.
I started working on a fix for this by arming the send CQ when the QP
reaches 63 outstanding requests and draining the CQ at the completion
handler while holding priv->tx_lock.
But I had another strange problem that I don't understand. If I just
load and unload ib_ipoib, the system crashes showing messages that
appear like there has been a memory corruption. If I comment out
destroying the send CQ at ipoib_transport_dev_cleanup() the crashes
disappear. Do you see this as well?
More information about the general