[ewg] iscsi initiator ipoib+lro crash on upstream kernel

Eli Cohen eli at dev.mellanox.co.il
Thu Feb 19 08:55:05 PST 2009


Hi,

I have encountered a kernel crash when running a iSCSI initiator on
IPoIB configured with LRO (if LRO is off it does not happen). This
was seen first on Sles10sp2 but then I verified it happens on 2.6.28.2
too. Bellow is a dump of the crash info from 2.6.28.2:

sd 2:0:0:1: Attached scsi generic sg3 type 0
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff803c50a4>] skb_seq_read+0xfb/0x1a1
PGD 227115067 PUD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/platform/host2/session2/target2:0:0/2:0:0:1/type
CPU 2 
Modules linked in: ib_uverbs ib_umad mlx4_ib nfs lockd nfs_acl mlx4_core sunrpc ib_mthca ib_ipoib ib_cm ib_sa ib_mad ib_core inet_lro ipv6 button battery a]
Pid: 0, comm: swapper Not tainted 2.6.28.2-debug #3
RIP: 0010:[<ffffffff803c50a4>]  [<ffffffff803c50a4>] skb_seq_read+0xfb/0x1a1
RSP: 0018:ffff88022f0e3b00  EFLAGS: 00010246
RAX: ffff88022dd44f38 RBX: ffff88022f0e3b30 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88022f0e3b88 RDI: 00000000000007d4
RBP: 00000000000007d4 R08: ffff880220476d30 R09: 000000000000085c
R10: 00000000000b0038 R11: ffffffffa0126115 R12: ffff88022f0e3b88
R13: ffff88022d974d38 R14: 00000000000007d4 R15: 00000000000007d4
FS:  0000000000000000(0000) GS:ffff88022f07bb50(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000004 CR3: 00000002271c2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88022f0da000, task ffff88022f0a4050)
Stack:
 ffff88022d974fa0 ffff88022f0e3b30 00000000000007d4 ffffffffa01261fe
 ffff88022d974f80 ffff880220418068 0000085c00000000 000007d400000000
 ffff880220476d30 ffff88022dd44f38 0000000000000000 ffff88022dd44e58
Call Trace:
 <IRQ> <0> [<ffffffffa01261fe>] ? iscsi_tcp_recv+0x64/0x39b [iscsi_tcp]
 [<ffffffff803f0d0f>] ? ip_queue_xmit+0x2aa/0x2fd
 [<ffffffff803f60fd>] ? tcp_read_sock+0x97/0x212
 [<ffffffffa012619a>] ? iscsi_tcp_recv+0x0/0x39b [iscsi_tcp]
 [<ffffffffa012615d>] ? iscsi_tcp_data_ready+0x48/0x85 [iscsi_tcp]
 [<ffffffff803ff119>] ? tcp_rcv_established+0x4c0/0x567
 [<ffffffff804042f8>] ? tcp_v4_do_rcv+0x2c/0x1c8
 [<ffffffff80405fb9>] ? tcp_v4_rcv+0x630/0x683
 [<ffffffff803c6552>] ? skb_release_head_state+0x60/0x8f
 [<ffffffff803ecb9f>] ? ip_local_deliver_finish+0xda/0x197
 [<ffffffff803ecaab>] ? ip_rcv_finish+0x32f/0x349
 [<ffffffffa024e42d>] ? lro_flush+0x159/0x17e [inet_lro]
 [<ffffffffa024eb2e>] ? __lro_proc_skb+0x1ca/0x1ed [inet_lro]
 [<ffffffff80221e28>] ? swiotlb_map_single_phys+0x0/0x12
 [<ffffffffa024eb69>] ? lro_receive_skb+0x18/0x3e [inet_lro]
 [<ffffffffa0299582>] ? ipoib_ib_handle_rx_wc+0x1ed/0x22b [ib_ipoib]
 [<ffffffffa0299e97>] ? ipoib_poll+0x9c/0x173 [ib_ipoib]
 [<ffffffff803ce1d0>] ? net_rx_action+0x9d/0x175
 [<ffffffff80239ffb>] ? __do_softirq+0x7a/0x13d
 [<ffffffff8020cf4c>] ? call_softirq+0x1c/0x28
 [<ffffffff8020df5d>] ? do_softirq+0x2c/0x68
 [<ffffffff8020e05b>] ? do_IRQ+0xc2/0xdf
 [<ffffffff8020c206>] ? ret_from_intr+0x0/0xa
 <EOI> <0> [<ffffffff80212464>] ? mwait_idle+0x41/0x44
 [<ffffffff8020abca>] ? cpu_idle+0x40/0x5e
Code: ff 88 48 e0 ff ff 48 c7 43 20 00 00 00 00 ff 43 08 8b 46 0c 01 43 0c 48 8b 43 18 8b 4b 08 8b 90 b4 00 00 00 48 03 90 b8 00 00 00 <0f> b7 42 04 39 c1  
RIP  [<ffffffff803c50a4>] skb_seq_read+0xfb/0x1a1
 RSP <ffff88022f0e3b00>
CR2: 0000000000000004
Kernel panic - not syncing: Fatal exception in interrupt

When I looked at this on sles10 I was able to verify that the problem was with
(see bellow where this comes from) st->cur_skb->next equals 0xffffffff:

 if (st->cur_skb->next) {
                st->cur_skb = st->cur_skb->next;  <<<=== this where I see the problem
                st->frag_idx = 0;
                goto next_skb;
        } else if (st->root_skb == st->cur_skb &&





More information about the ewg mailing list