[ofa-general] [Bug 447] ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet

Thu Mar 15 16:56:40 PDT 2007

https://bugs.openfabrics.org/show_bug.cgi?id=447

------- Comment #3 from DarylGrunau at gmail.com  2007-03-15 16:56 -------
In spite of version-skew cleanup we are still experiencing kernel panics on our
I/O nodes.  I'll inline one of the latest stack traces for reference:

Kernel BUG at dev:1121
invalid operand: 0000 [1] SMP 
CPU 4 
Modules linked in: myri10ge(U) rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_ipoib(U)
ib_ipath(U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U)
ib_mad(U) ib_core(U) bluesmoke_k8 bluesmoke_mc perfctr ipmi_devintf ipmi_si
ipmi_msghandler bnx2 ext3 jbd nfs lockd nfs_acl sunrpc
Pid: 0, comm: swapper Not tainted 2.6.9-34.ELsmp.lanl
RIP: 0010:[<ffffffff802aafc2>] <ffffffff802aafc2>{__skb_linearize+62}
RSP: 0018:000001041ffbbcf8  EFLAGS: 00010203
RAX: 0000000000000001 RBX: 000000000000001c RCX: 000001061fec6480
RDX: 00000000ffffffdc RSI: 0000000000000220 RDI: 000001061fec6400
RBP: 000001021f5b5d40 R08: 0000000000000000 R09: 000000000000003c
R10: 0000000000000000 R11: 0000000000000000 R12: 000001041ff78280
R13: 0000000000000000 R14: 000001021ebd0000 R15: 0000000000000000
FS:  0000002a958a0b00(0000) GS:ffffffff804d8480(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000002a9556c000 CR3: 00000000dfcac000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo 000001061ff96000, task 0000010220032800)
Stack: 000001021ebd0000 000001021ebd0000 00000000fffffff4 000001041ff78280 
       0000000000000000 ffffffff802ab133 000001021f5b5d40 000001041ff782c0 
       000001021f5b5d40 ffffffff802b01c8 
Call Trace:<IRQ> <ffffffff802ab133>{dev_queue_xmit+93}
<ffffffff802b01c8>{neigh_resolve_output+578} 
       <ffffffff802afdc4>{neigh_update+626}
<ffffffff802e62a7>{arp_process+1257} 
       <ffffffff8013f4ac>{__mod_timer+293}
<ffffffff802ab87d>{netif_receive_skb+590} 
       <ffffffff802ab939>{process_backlog+136}
<ffffffff802aba43>{net_rx_action+129} 
       <ffffffff8013bf38>{__do_softirq+88} <ffffffff8013bfe1>{do_softirq+49} 
       <ffffffff801131a7>{do_IRQ+328} <ffffffff801107bf>{ret_from_intr+0} 
        <EOI> <ffffffff8010e749>{default_idle+0}
<ffffffff8010e769>{default_idle+32} 
       <ffffffff8010e7dc>{cpu_idle+26} 

Code: 0f 0b 41 ee 31 80 ff ff ff ff 61 04 85 d2 b8 00 00 00 00 0f 
RIP <ffffffff802aafc2>{__skb_linearize+62} RSP <000001041ffbbcf8>
 <0>Kernel panic - not syncing: Oops

An IBM engineer from the Linux Technology Center has also been looking into our
problems and writes the following conjecture:

1) one adapter (probably with large MTU) allocates a
        receive skb that has multiple memory buffers [ok]
2) an ARP request is received in that skb [ok]
3) driver delivers this buffer to upper layer, but skb is marked as shared
[NOT OK]
4) ARP re-uses that same skb to respond to the ARP request [ok]
5) outgoing device does not support scatter/gather on output, so output
        packet goes to skb_linearize()
6) panic(), because shared buffers are not allowed in skb_linearize()

        A couple things to note:
A) No buffer delivered to the upper layers from a driver should be shared.
        It only matters (and is only checked) in a few cases -- ARP, ICMP,
        and some IPSEC cases -- but it is incorrect because of those cases.
B) On Linux, the interface that received the ARP request is not necessarily
        the interface that sent the request. Since most interfaces that
support
        scatter/gather on input also support it on output, this is probably
a
        case where it received an ARP request for an Infiniband interface
        but received it on a jumbo-frame Ethernet NIC, and is sending the
        response back via the Infiniband interface. This can happen if both
        machines are connected to both networks. If that is the case, a
        simple workaround is to force ARP responses to only be sent on
        the interface on which it was received by:

        sysctl -w net.ipv4.conf.all.arp_filter=2
C) The code in the lower layers that is causing the skb to be shared would
        be a call to skb_clone(). One way of preventing this is to make all
        calls to skb_clone() that have nr_frags nonzero make a new copy
        of the buffer instead. Later kernel versions also have a version of
        skb_linearize that allows shared skb's. If this code was backported
        from a later kernel version, that may be where the bug came from in
        the first place.

We will likely install an instrumented kernel to show what is happening in
skb_linearize.  Any further comments about above appreciated

-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.