[openib-general] BUG ipoib oops in iptables/netfilter

Grant Grundler iod00d at hp.com
Mon Mar 28 21:41:58 PST 2005


Hi all,
ia64 box data page faulted in the kernel when netperf "netserver"
started on an rx2600 for the first time since booting.  Using a 2.6.11
kernel with netfiltering enabled and openib.org SVN v2050
drivers/infiniband from gen2 branch. I don't think this is
an arch specific bug.

ib0 is configured as 10.0.0.51. I tried to start netperf from a
second node (10.0.0.113) with:
   /usr/local/bin/netperf -c -l 60 -H 10.0.0.51 -t TCP_STREAM -- -m 8192 -s 262144 -S 262144

10.0.0.113 had just completed a few runs of netperf against a 3rd node
(10.0.0.81) just fine - which does NOT have any iptables rules set up.
A set of IP tables rules are loaded on 10.0.0.51 to firewall the an
external facing NIC (gsyprf3 eth0). None of the rules reference ib0
since the firewall rc script doesn't know anything about ib0/1.
IIRC, I previously avoided this by clearing out all the iptables rules.

All three systems are running identical kernel/modules and debian
"testing" user space bits.

This is the second time I've seen this (over the past week).
But I haven't tried to track it down...would like to finish collecting
perf data first. FWIW, default IPoIB perf is pathetic: ~1.5-1.6Gb/s
with the above netperf command line. I was trying to (and will) collect
netperf numbers again with msi_x=1.

Tombstone follows.

thanks,
grant

gsyprf3:~# modprobe ib_mthca msi_x=1
ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004)
ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:81:00.0)
GSI 60 (level, low) -> CPU 1 (0x0100) vector 69
ACPI: PCI interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 69
gsyprf3:~# modprobe ib_ipoib
gsyprf3:~# 
gsyprf3:~# lsmod
Module                  Size  Used by
ib_ipoib               90488  0 
ib_sa                  23980  1 ib_ipoib
ib_mthca              211335  0 
ib_mad                 71400  2 ib_sa,ib_mthca
ib_core                85288  4 ib_ipoib,ib_sa,ib_mthca,ib_mad
ipt_state               5528  13 
qla2300               127272  0 
qla2xxx               250463  1 qla2300
scsi_transport_fc      45672  1 qla2xxx
e1000                 187588  0 
tg3                   197888  0 
e100                   79630  0 
dm_mod                136584  0 
gsyprf3:~# ifconfig ib0 10.0.0.51 netmask 255.255.255.0 broadcast 10.0.0.255
...
gsyprf3:~# ifconfig ib0
ib0       Link encap:UNSPEC  HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.0.0.51  Bcast:10.0.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

gsyprf3:~# Unable to handle kernel NULL pointer dereference (address 0000000000000001)
swapper[0]: Oops 8813272891392 [1]
Modules linked in: ib_ipoib ib_sa ib_mthca ib_mad ib_core ipt_state qla2300 qla2xxx scsi_transport_fc e1000 tg3 e100 dm_mod

Pid: 0, CPU 0, comm:              swapper
psr : 0000101008026038 ifs : 8000000000000004 ip  : [<a0000001002ea7d0>]    Not tainted
ip is at __copy_user+0x890/0x940
unat: 0000000000000000 pfs : 0000000000000a18 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 8000000a955655a5
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100676f70 b6  : a000000100003320 b7  : a0000001006770a0
f6  : 000000000000000000000 f7  : 1003e0000000000000080
f8  : 1003e0000000000002000 f9  : 1003e0000000000002000
f10 : 1000682fffffff7d00000 f11 : 1003e0000000000000000
r1  : a000000100cb1950 r2  : 0000000000000000 r3  : 0000000000000001
r8  : e0000000060ee006 r9  : 0000000000000000 r10 : e000000005c37ef8
r11 : 000000000000010c r12 : a00000010088fb50 r13 : a000000100888000
r14 : 0000000000002000 r15 : 000000000000006f r16 : e0000000060ee006
r17 : e0000000060ee079 r18 : e0000000060ee07a r19 : e0000000060ee008
r20 : 0000000000000000 r21 : e0000000060ee028 r22 : e000000005d03cc0
r23 : e0000000060ee020 r24 : 00000000000146cc r25 : e000000005d03c28
r26 : e0000000060ee018 r27 : 000000004248e501 r28 : 0000000000000000
r29 : 0000000000000000 r30 : e0000000060ee079 r31 : e000000005d03c50

Call Trace:
 [<a00000010000f3a0>] show_stack+0x80/0xa0
                                sp=a00000010088f710 bsp=a000000100889778
 [<a00000010000fc00>] show_regs+0x7e0/0x800
                                sp=a00000010088f8e0 bsp=a000000100889718
 [<a000000100033730>] die+0x150/0x1c0
                                sp=a00000010088f8f0 bsp=a0000001008896d8
 [<a000000100053b70>] ia64_do_page_fault+0x370/0x980
                                sp=a00000010088f8f0 bsp=a000000100889670
 [<a00000010000a780>] ia64_leave_kernel+0x0/0x260
                                sp=a00000010088f980 bsp=a000000100889670
 [<a0000001002ea7d0>] __copy_user+0x890/0x940
                                sp=a00000010088fb50 bsp=a000000100889650
 [<a000000100676f70>] ipt_ulog_packet+0x6d0/0x800
                                sp=a00000010088fb50 bsp=a0000001008895a8
 [<a0000001006770e0>] ipt_ulog_target+0x40/0x60
                                sp=a00000010088fb50 bsp=a000000100889568
 [<a0000001006637c0>] ipt_do_table+0x700/0x800
                                sp=a00000010088fb50 bsp=a0000001008894c0
 [<a00000010066a860>] ipt_hook+0x40/0x60
                                sp=a00000010088fb60 bsp=a000000100889488
 [<a0000001005ca3d0>] nf_iterate+0x150/0x200
                                sp=a00000010088fb60 bsp=a000000100889430
 [<a0000001005cab20>] nf_hook_slow+0xa0/0x200
                                sp=a00000010088fb60 bsp=a0000001008893b0
 [<a0000001005e1060>] ip_local_deliver+0x4e0/0x560
                                sp=a00000010088fb70 bsp=a000000100889368
 [<a0000001005e2770>] ip_rcv_finish+0x610/0x720
                                sp=a00000010088fb70 bsp=a000000100889328
 [<a0000001005cabe0>] nf_hook_slow+0x160/0x200
                                sp=a00000010088fb80 bsp=a0000001008892b0
 [<a0000001005e1b70>] ip_rcv+0xa90/0xc20
                                sp=a00000010088fb90 bsp=a000000100889258
 [<a0000001005b1290>] netif_receive_skb+0x450/0x580
                                sp=a00000010088fba0 bsp=a000000100889208
 [<a0000001005b1510>] process_backlog+0x150/0x320
                                sp=a00000010088fba0 bsp=a000000100889190
 [<a0000001005b1850>] net_rx_action+0x170/0x320
                                sp=a00000010088fba0 bsp=a000000100889138
 [<a000000100095780>] __do_softirq+0x200/0x240
                                sp=a00000010088fbb0 bsp=a000000100889098
 [<a000000100095840>] do_softirq+0x80/0xe0
                                sp=a00000010088fbb0 bsp=a000000100889038
 [<a000000100095a80>] irq_exit+0x80/0xa0
                                sp=a00000010088fbb0 bsp=a000000100889020
 [<a00000010000e3f0>] ia64_handle_irq+0x110/0x140
                                sp=a00000010088fbb0 bsp=a000000100888fe0
 [<a00000010000a780>] ia64_leave_kernel+0x0/0x260
                                sp=a00000010088fbb0 bsp=a000000100888fe0
 [<a00000010000e740>] ia64_pal_call_static+0xa0/0xc0
                                sp=a00000010088fd80 bsp=a000000100888f90
 [<a00000010000fe20>] default_idle+0x100/0x1a0
                                sp=a00000010088fd80 bsp=a000000100888f40
 [<a000000100010260>] cpu_idle+0x200/0x2c0
                                sp=a00000010088fe20 bsp=a000000100888ed0
 [<a0000001000091b0>] rest_init+0x50/0x80
                                sp=a00000010088fe20 bsp=a000000100888eb8
 [<a000000100848f00>] start_kernel+0x440/0x4c0
                                sp=a00000010088fe20 bsp=a000000100888e58
 [<a000000100008600>] _start+0x2e0/0x300
                                sp=a00000010088fe30 bsp=a000000100888de0
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 




More information about the general mailing list