[openib-general] 2.6.11.11 NFS over IPoIB crash

Troy Benjegerdes troy at scl.ameslab.gov
Wed Jun 8 13:57:17 PDT 2005


We are running NFS over IPOIB, and are getting kernel panics under heavy 
NFS I/O. This is on a PowerMac G5, and the server is a dual opteron 
running 2.6.11 with the OpenIB code from subversion. It looks like a bug 
in nfs.. but we've only seen it using IPoIB... Is it worth trying to 
reproduce this over gigabit ethernet?

Brett Bode wrote:

> Well I finally got 2.6.11.11 running this morning, but crashed it too. 
> I guess I will try updating to the latest OpenIB code in subversion? 
> Here is the kernel panic:
>
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=2 POWERMAC
> Modules linked in: ib_ipoib ib_sa eth1394 ib_mthca ib_mad ib_core 
> ohci1394 ieee1394
> NIP: C000000000038B58 XER: 00000000 LR: C0000000000390C8 CTR: 
> C00000000003B340
> REGS: c000000254c845f0 TRAP: 0300   Not tainted  (2.6.11.11-G5)
> MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 42028488
> DAR: c00000060045b218 DSISR: 0000000040010000
> TASK: c0000000019047d0[2004] 'IOTest.ppc.x' THREAD: c000000254c84000 
> CPU: 1
> GPR00: 0000000600000010 C000000254C84870 C00000000045F280 
> C0000000019047D0
> GPR04: C0000000004B4480 0000000000000000 000000003B9A1A60 
> C0000000004B4488
> GPR08: C00000000045B208 C000000254C84000 C000000000328C40 
> C000000000328C40
> GPR12: 0000000028028428 C00000000037E000 C00000000043B170 
> C000000000470340
> GPR16: 0000000000000000 0000000000000001 0000000000000000 
> C00000000044EE20
> GPR20: 0000000000000000 0000000000000000 0000000000000003 
> 0000000000000000
> GPR24: C00000000FCC4000 0000000000000000 C00000000FD0B030 
> 0000000000000001
> GPR28: C0000000004B3B40 0000000000000001 C000000000391A90 
> C000000254C84870
> NIP [c000000000038b58] .resched_task+0x38/0xcc
> LR [c0000000000390c8] .try_to_wake_up+0x304/0x324
> Call Trace:
> [c000000254c848f0] [c0000000000390c8] .try_to_wake_up+0x304/0x324
> [c000000254c849d0] [c00000000003b3e0] .__wake_up_common+0x84/0xe8
> [c000000254c84a90] [c00000000003b4a0] .__wake_up+0x5c/0x94
> [c000000254c84b40] [c000000000057840] .__queue_work+0x80/0xb4
> [c000000254c84be0] [c000000000057908] .queue_work+0x94/0xb0
> [c000000254c84c60] [c000000000280df8] .rpc_make_runnable+0xe0/0x154
> [c000000254c84cf0] [c000000000281324] .__rpc_do_wake_up_task+0xc8/0x24c
> [c000000254c84d80] [c000000000281608] .rpc_wake_up_task+0xb0/0xd8
> [c000000254c84e20] [c00000000027e61c] .xprt_complete_rqst+0x90/0x188
> [c000000254c84ed0] [c00000000027ec3c] .udp_data_ready+0x230/0x290
> [c000000254c84fa0] [c000000000236714] .udp_queue_rcv_skb+0x268/0x51c
> [c000000254c85040] [c000000000236fc8] .udp_rcv+0x600/0x7f0
> [c000000254c85160] [c000000000209e8c] .ip_local_deliver_finish+0xd4/0x35c
> [c000000254c851f0] [c0000000001fa928] .nf_hook_slow+0x184/0x1bc
> [c000000254c852d0] [c00000000020a360] .ip_local_deliver+0x24c/0x3bc
> [c000000254c85360] [c00000000020a6f4] .ip_rcv_finish+0x224/0x378
> [c000000254c85410] [c0000000001fa928] .nf_hook_slow+0x184/0x1bc
> [c000000254c854f0] [c00000000020ad30] .ip_rcv+0x4e8/0x610
> [c000000254c855a0] [c0000000001eac68] .netif_receive_skb+0x2ec/0x3a0
> [c000000254c85650] [c0000000001eae34] .process_backlog+0x118/0x278
> [c000000254c85730] [c0000000001eb07c] .net_rx_action+0xe8/0x24c
> [c000000254c857f0] [c000000000047ea8] .__do_softirq+0xdc/0x1b8
> [c000000254c858b0] [c00000000004800c] .do_softirq+0x88/0x90
> [c000000254c85940] [c0000000000480b4] .local_bh_enable+0xa0/0xa4
> [c000000254c859c0] [c0000000001ea274] .dev_queue_xmit+0x2f4/0x3a0
> [c000000254c85a60] [c0000000001f3308] .neigh_connected_output+0x104/0x1a0
> [c000000254c85b00] [c00000000020e824] .ip_finish_output2+0x100/0x2dc
> [c000000254c85bc0] [c00000000020fa18] .ip_fragment+0x4d8/0x800
> [c000000254c85cd0] [c00000000024f768] .ip_refrag+0xa4/0xb0
> [c000000254c85d70] [c0000000001fa2a4] .nf_iterate+0x128/0x190
> [c000000254c85e30] [c0000000001fa86c] .nf_hook_slow+0xc8/0x1bc
> [c000000254c85f10] [c00000000020ebd8] .ip_finish_output+0x1d8/0x36c
> [c000000254c85fd0] [c00000000020fa18] .ip_fragment+0x4d8/0x800
> [c000000254c860e0] [c00000000020e1e4] .dst_output+0x4c/0x80
> [c000000254c86170] [c0000000001fa928] .nf_hook_slow+0x184/0x1bc
> [c000000254c86250] [c00000000021162c] .ip_push_pending_frames+0x4ec/0x550
> [c000000254c86300] [c0000000002353b8] 
> .udp_push_pending_frames+0x16c/0x260
> [c000000254c863c0] [c000000000235af0] .udp_sendmsg+0x644/0x758
> [c000000254c86570] [c00000000023fdd0] .inet_sendmsg+0x88/0xb8
> [c000000254c86610] [c0000000001dba14] .sock_sendmsg+0xdc/0x13c
> [c000000254c86810] [c0000000001dbab0] .kernel_sendmsg+0x3c/0x64
> [c000000254c868a0] [c0000000001dff94] .sock_no_sendpage+0x9c/0xb8
> [c000000254c86970] [c000000000235d88] .udp_sendpage+0x184/0x200
> [c000000254c86a60] [c00000000023fe9c] .inet_sendpage+0x9c/0x108
> [c000000254c86b10] [c00000000028cd24] .xdr_sendpages+0x280/0x34c
> [c000000254c86c80] [c00000000027fad0] .xprt_transmit+0x170/0x550
> [c000000254c86d50] [c00000000027cfd4] .call_transmit+0x1cc/0x2bc
> [c000000254c86e00] [c0000000002822fc] .__rpc_execute+0xf4/0x524
> [c000000254c86f60] [c0000000000ff690] .nfs_execute_write+0x50/0x88
> [c000000254c87000] [c0000000000ffbc0] .nfs_flush_list+0x4f8/0x644
> [c000000254c87120] [c0000000000ffe48] .nfs_flush_inode+0x13c/0x148
> [c000000254c871f0] [c0000000001012e8] .nfs_writepages+0x10c/0x200
> [c000000254c872a0] [c000000000073858] .do_writepages+0x60/0x8c
> [c000000254c87320] [c0000000000c41c8] 
> .__writeback_single_inode+0xa8/0x4b8
> [c000000254c87460] [c0000000000c4de8] .sync_sb_inodes+0x224/0x37c
> [c000000254c87530] [c0000000000c529c] .writeback_inodes+0x19c/0x1ac
> [c000000254c875d0] [c00000000007302c] 
> .balance_dirty_pages_ratelimited+0xe8/0x254
> [c000000254c87700] [c00000000006dd90] 
> .generic_file_buffered_write+0x38c/0x7b4
> [c000000254c878c0] [c00000000006e71c] 
> .__generic_file_aio_write_nolock+0x2b0/0x498
> [c000000254c87a00] [c00000000006e9a0] .generic_file_aio_write+0x9c/0x1b0
> [c000000254c87ac0] [c0000000000f37bc] .nfs_file_write+0xbc/0x158
> [c000000254c87b70] [c000000000091b48] .do_sync_write+0xcc/0x140
> [c000000254c87ce0] [c000000000091d40] .vfs_write+0x184/0x18c
> [c000000254c87d90] [c000000000091e38] .sys_write+0x54/0x9c
> Instruction dump:
> fbe1fff8 ebc2a958 f8010010 f821ff81 60000000 e9230008 7c3f0b78 e91e8000
> e97e8008 80090010 7d6a5b78 78001f24 <7d28002a> 7d69502e 2fab0000 7c000026
>




More information about the general mailing list