[openib-general] Re: A Couple of IPoIB related issues
Hal Rosenstock
halr at voltaire.com
Thu Jul 14 04:31:00 PDT 2005
On Thu, 2005-07-07 at 15:27, Roland Dreier wrote:
> Hal> 1. NETDEV WATCHDOG: ib0: transmit timed out ib0: transmit
> Hal> timeout: latency 360052
>
> Hal> This occurs once a minute on heavy pings.
>
> Exactly once a minute, or is this just a rough frequency? What is a
> heavy ping? Does the IPoIB interface still work when you see this?
I should have said they are exactly once per second (rather than
1/minute).
Jun 19 19:34:01 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:01 mo1 kernel: ib0: transmit timeout: latency 3879
Jun 19 19:34:02 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:02 mo1 kernel: ib0: transmit timeout: latency 4879
Jun 19 19:34:03 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:03 mo1 kernel: ib0: transmit timeout: latency 5879
...
A "heavy ping" is twenty concurrent flood pings.
During these warnings the ipoib interface does seem to work.
> Is there more traceback that shows who is doing the allocation that
> fails? In any case it looks like you are just running low on memory.
>
> Hal> Any idea on what could be causing these or how to go about
> Hal> isolating them ?
>
> The __alloc_pages() warnings look fairly benign and can probably be
> fixed by tuning /proc/sys/vm/min_free_kbytes appropriately.
>
> The TX timeout is somewhat odd. I guess we need to figure out why the
> netdevice queue is stopped for a long time.
How can that be determined ?
Also, is there missing code on handling ipoib timeouts ?
-- Hal
More information about the general
mailing list