[openib-general] Re: A Couple of IPoIB related issues

Hal Rosenstock halr at voltaire.com
Thu Jul 14 04:31:00 PDT 2005


On Thu, 2005-07-07 at 15:27, Roland Dreier wrote:
>     Hal> 1. NETDEV WATCHDOG: ib0: transmit timed out ib0: transmit
>     Hal> timeout: latency 360052
> 
>     Hal> This occurs once a minute on heavy pings.
> 
> Exactly once a minute, or is this just a rough frequency?  What is a
> heavy ping?  Does the IPoIB interface still work when you see this?

I should have said they are exactly once per second (rather than
1/minute).

Jun 19 19:34:01 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:01 mo1 kernel: ib0: transmit timeout: latency 3879
Jun 19 19:34:02 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:02 mo1 kernel: ib0: transmit timeout: latency 4879
Jun 19 19:34:03 mo1 kernel: NETDEV WATCHDOG: ib0: transmit timed out
Jun 19 19:34:03 mo1 kernel: ib0: transmit timeout: latency 5879
...

A "heavy ping" is twenty concurrent flood pings.
During these warnings the ipoib interface does seem to work.

> Is there more traceback that shows who is doing the allocation that
> fails?  In any case it looks like you are just running low on memory.
> 
>     Hal> Any idea on what could be causing these or how to go about
>     Hal> isolating them ?
> 
> The __alloc_pages() warnings look fairly benign and can probably be
> fixed by tuning /proc/sys/vm/min_free_kbytes appropriately.
> 
> The TX timeout is somewhat odd.  I guess we need to figure out why the
> netdevice queue is stopped for a long time.

How can that be determined ?

Also, is there missing code on handling ipoib timeouts ?

-- Hal




More information about the general mailing list