[openib-general] BUG ipoib oops in iptables/netfilter
Grant Grundler
iod00d at hp.com
Mon Mar 28 21:41:58 PST 2005
Hi all,
ia64 box data page faulted in the kernel when netperf "netserver"
started on an rx2600 for the first time since booting. Using a 2.6.11
kernel with netfiltering enabled and openib.org SVN v2050
drivers/infiniband from gen2 branch. I don't think this is
an arch specific bug.
ib0 is configured as 10.0.0.51. I tried to start netperf from a
second node (10.0.0.113) with:
/usr/local/bin/netperf -c -l 60 -H 10.0.0.51 -t TCP_STREAM -- -m 8192 -s 262144 -S 262144
10.0.0.113 had just completed a few runs of netperf against a 3rd node
(10.0.0.81) just fine - which does NOT have any iptables rules set up.
A set of IP tables rules are loaded on 10.0.0.51 to firewall the an
external facing NIC (gsyprf3 eth0). None of the rules reference ib0
since the firewall rc script doesn't know anything about ib0/1.
IIRC, I previously avoided this by clearing out all the iptables rules.
All three systems are running identical kernel/modules and debian
"testing" user space bits.
This is the second time I've seen this (over the past week).
But I haven't tried to track it down...would like to finish collecting
perf data first. FWIW, default IPoIB perf is pathetic: ~1.5-1.6Gb/s
with the above netperf command line. I was trying to (and will) collect
netperf numbers again with msi_x=1.
Tombstone follows.
thanks,
grant
gsyprf3:~# modprobe ib_mthca msi_x=1
ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004)
ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:81:00.0)
GSI 60 (level, low) -> CPU 1 (0x0100) vector 69
ACPI: PCI interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 69
gsyprf3:~# modprobe ib_ipoib
gsyprf3:~#
gsyprf3:~# lsmod
Module Size Used by
ib_ipoib 90488 0
ib_sa 23980 1 ib_ipoib
ib_mthca 211335 0
ib_mad 71400 2 ib_sa,ib_mthca
ib_core 85288 4 ib_ipoib,ib_sa,ib_mthca,ib_mad
ipt_state 5528 13
qla2300 127272 0
qla2xxx 250463 1 qla2300
scsi_transport_fc 45672 1 qla2xxx
e1000 187588 0
tg3 197888 0
e100 79630 0
dm_mod 136584 0
gsyprf3:~# ifconfig ib0 10.0.0.51 netmask 255.255.255.0 broadcast 10.0.0.255
...
gsyprf3:~# ifconfig ib0
ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.0.51 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
gsyprf3:~# Unable to handle kernel NULL pointer dereference (address 0000000000000001)
swapper[0]: Oops 8813272891392 [1]
Modules linked in: ib_ipoib ib_sa ib_mthca ib_mad ib_core ipt_state qla2300 qla2xxx scsi_transport_fc e1000 tg3 e100 dm_mod
Pid: 0, CPU 0, comm: swapper
psr : 0000101008026038 ifs : 8000000000000004 ip : [<a0000001002ea7d0>] Not tainted
ip is at __copy_user+0x890/0x940
unat: 0000000000000000 pfs : 0000000000000a18 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : 8000000a955655a5
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a000000100676f70 b6 : a000000100003320 b7 : a0000001006770a0
f6 : 000000000000000000000 f7 : 1003e0000000000000080
f8 : 1003e0000000000002000 f9 : 1003e0000000000002000
f10 : 1000682fffffff7d00000 f11 : 1003e0000000000000000
r1 : a000000100cb1950 r2 : 0000000000000000 r3 : 0000000000000001
r8 : e0000000060ee006 r9 : 0000000000000000 r10 : e000000005c37ef8
r11 : 000000000000010c r12 : a00000010088fb50 r13 : a000000100888000
r14 : 0000000000002000 r15 : 000000000000006f r16 : e0000000060ee006
r17 : e0000000060ee079 r18 : e0000000060ee07a r19 : e0000000060ee008
r20 : 0000000000000000 r21 : e0000000060ee028 r22 : e000000005d03cc0
r23 : e0000000060ee020 r24 : 00000000000146cc r25 : e000000005d03c28
r26 : e0000000060ee018 r27 : 000000004248e501 r28 : 0000000000000000
r29 : 0000000000000000 r30 : e0000000060ee079 r31 : e000000005d03c50
Call Trace:
[<a00000010000f3a0>] show_stack+0x80/0xa0
sp=a00000010088f710 bsp=a000000100889778
[<a00000010000fc00>] show_regs+0x7e0/0x800
sp=a00000010088f8e0 bsp=a000000100889718
[<a000000100033730>] die+0x150/0x1c0
sp=a00000010088f8f0 bsp=a0000001008896d8
[<a000000100053b70>] ia64_do_page_fault+0x370/0x980
sp=a00000010088f8f0 bsp=a000000100889670
[<a00000010000a780>] ia64_leave_kernel+0x0/0x260
sp=a00000010088f980 bsp=a000000100889670
[<a0000001002ea7d0>] __copy_user+0x890/0x940
sp=a00000010088fb50 bsp=a000000100889650
[<a000000100676f70>] ipt_ulog_packet+0x6d0/0x800
sp=a00000010088fb50 bsp=a0000001008895a8
[<a0000001006770e0>] ipt_ulog_target+0x40/0x60
sp=a00000010088fb50 bsp=a000000100889568
[<a0000001006637c0>] ipt_do_table+0x700/0x800
sp=a00000010088fb50 bsp=a0000001008894c0
[<a00000010066a860>] ipt_hook+0x40/0x60
sp=a00000010088fb60 bsp=a000000100889488
[<a0000001005ca3d0>] nf_iterate+0x150/0x200
sp=a00000010088fb60 bsp=a000000100889430
[<a0000001005cab20>] nf_hook_slow+0xa0/0x200
sp=a00000010088fb60 bsp=a0000001008893b0
[<a0000001005e1060>] ip_local_deliver+0x4e0/0x560
sp=a00000010088fb70 bsp=a000000100889368
[<a0000001005e2770>] ip_rcv_finish+0x610/0x720
sp=a00000010088fb70 bsp=a000000100889328
[<a0000001005cabe0>] nf_hook_slow+0x160/0x200
sp=a00000010088fb80 bsp=a0000001008892b0
[<a0000001005e1b70>] ip_rcv+0xa90/0xc20
sp=a00000010088fb90 bsp=a000000100889258
[<a0000001005b1290>] netif_receive_skb+0x450/0x580
sp=a00000010088fba0 bsp=a000000100889208
[<a0000001005b1510>] process_backlog+0x150/0x320
sp=a00000010088fba0 bsp=a000000100889190
[<a0000001005b1850>] net_rx_action+0x170/0x320
sp=a00000010088fba0 bsp=a000000100889138
[<a000000100095780>] __do_softirq+0x200/0x240
sp=a00000010088fbb0 bsp=a000000100889098
[<a000000100095840>] do_softirq+0x80/0xe0
sp=a00000010088fbb0 bsp=a000000100889038
[<a000000100095a80>] irq_exit+0x80/0xa0
sp=a00000010088fbb0 bsp=a000000100889020
[<a00000010000e3f0>] ia64_handle_irq+0x110/0x140
sp=a00000010088fbb0 bsp=a000000100888fe0
[<a00000010000a780>] ia64_leave_kernel+0x0/0x260
sp=a00000010088fbb0 bsp=a000000100888fe0
[<a00000010000e740>] ia64_pal_call_static+0xa0/0xc0
sp=a00000010088fd80 bsp=a000000100888f90
[<a00000010000fe20>] default_idle+0x100/0x1a0
sp=a00000010088fd80 bsp=a000000100888f40
[<a000000100010260>] cpu_idle+0x200/0x2c0
sp=a00000010088fe20 bsp=a000000100888ed0
[<a0000001000091b0>] rest_init+0x50/0x80
sp=a00000010088fe20 bsp=a000000100888eb8
[<a000000100848f00>] start_kernel+0x440/0x4c0
sp=a00000010088fe20 bsp=a000000100888e58
[<a000000100008600>] _start+0x2e0/0x300
sp=a00000010088fe30 bsp=a000000100888de0
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
More information about the general
mailing list