[openib-general] nfsrdma server stop responding,
Vu Pham
vuhuong at mellanox.com
Mon Dec 11 17:25:42 PST 2006
James Lentini wrote:
> A couple of questions Vu:
>
> What NFS-RDMA release are you using? This looks like release 7.
>
Yes. I'm using release 7
> Is this reproducible?
I ran into it twice - I think that it may co-relate to
openSM restart incident. I'll double check it and confirm
>
> What kernel version are you using?
2.6.18.5
>
> What hardware is this on? It looks like x86-64 to me, which is fine. I
> just want to be sure I know what I'm looking at. As many specifics as
> possible is good (number of CPUs, hyperthreading, etc.)
>
Dual woodcrest xeon based CPUs
> Could you send the output of
>
> objdump -Slr /path/to/kernel/mm/swap.o
>
I attached the objdump output here
> Actually, just the put_page disassembly is all I want to see.
>
> Is there any more text available? Usually there is an explanation
> given for an oops message (e.g. "Unable to handle kernel paging
> request..").
>
I did not see any oops text message. System was still
responsive with ipoib ping or login
> I opened a bug at the NFS-RDMA SourceForge project to track this:
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1613201&group_id=97628&atid=618583
thanks for your help,
-vu
>
> Thanks for reporting this.
> james
>
> On Fri, 8 Dec 2006, Vu Pham wrote:
>
>> Hi James,
>> I got these errors in server's /var/log/messages and then the server stop
>> responding to login, I/O...; however, the server is still up, ipoib is still
>> working
>>
>>
>> Dec 8 06:38:21 ibd201 kernel: RIP: 0010:[<ffffffff8025dff7>]
>> [<ffffffff8025dff7>] put_page+0x17/0x40
>> Dec 8 06:38:21 ibd201 kernel: RSP: 0018:ffff810219ddfb08 EFLAGS: 00010246
>> Dec 8 06:38:21 ibd201 kernel: RAX: 0000000000000000 RBX: 0000000000000001
>> RCX: 000000000003ffff
>> Dec 8 06:38:21 ibd201 kernel: RDX: 0000000000000000 RSI: 0000000000000001
>> RDI: ffff8102274e92f8
>> Dec 8 06:38:21 ibd201 kernel: RBP: ffff8101ab785000 R08: 0000000000000034
>> R09: 0000000000000000
>> Dec 8 06:38:21 ibd201 kernel: R10: 0000000000000000 R11: 0000000000000000
>> R12: ffff81020ef96800
>> Dec 8 06:38:21 ibd201 kernel: R13: ffff8101ab785000 R14: 0000000000000000
>> R15: ffff8102053ee890
>> Dec 8 06:38:21 ibd201 kernel: FS: 00002ad76b8acb00(0000)
>> GS:ffff81022066eb40(0000) knlGS:0000000000000000
>> Dec 8 06:38:21 ibd201 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
>> 000000008005003b
>> Dec 8 06:38:21 ibd201 kernel: CR2: 00002aaaaabf1000 CR3: 000000021c22b000
>> CR4: 00000000000006e0
>> Dec 8 06:38:21 ibd201 kernel: Process nfsd (pid: 15038, threadinfo
>> ffff810219dde000, task ffff81020d87f0c0)
>> Dec 8 06:38:21 ibd201 kernel: Stack: ffffffff8835e547 ffff81020ef96968
>> ffff81020ef96800 ffff81020ef96958
>> Dec 8 06:38:21 ibd201 kernel: ffffffff88360c72 000000010395dc90
>> ffffffff80424e05 0000000000000000
>> Dec 8 06:38:21 ibd201 kernel: 0000000000200200 000000010395dc90
>> ffffffff80239b90 ffff81020d87f0c0
>> Dec 8 06:38:21 ibd201 kernel: Call Trace:
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff8835e547>]
>> :sunrpc:svc_rdma_put_context+0x37/0xd0
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff88360c72>]
>> :sunrpc:svc_rdma_recvfrom+0x5a2/0x11e0
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80424e05>]
>> schedule_timeout+0x95/0xb0
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80239b90>] process_timeout+0x0/0x10
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80423c2d>]
>> wait_for_completion_timeout+0xcd/0x150
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80228db0>]
>> default_wake_function+0x0/0x10
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff881c1402>]
>> :ib_mthca:mthca_cmd_post+0x232/0x260
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80228db0>]
>> default_wake_function+0x0/0x10
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff802fac39>] __next_cpu+0x19/0x30
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80227dae>]
>> find_busiest_group+0x24e/0x6d0
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80424772>] thread_return+0x0/0xde
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff804263f8>]
>> _spin_unlock_irqrestore+0x8/0x10
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff8023a331>]
>> try_to_del_timer_sync+0x51/0x60
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff8023a34c>] del_timer_sync+0xc/0x20
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80424e05>]
>> schedule_timeout+0x95/0xb0
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff883559e6>]
>> :sunrpc:svc_recv+0x416/0x510
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80228db0>]
>> default_wake_function+0x0/0x10
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff80228db0>]
>> default_wake_function+0x0/0x10
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff883a9540>] :nfsd:nfsd+0x0/0x380
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff883a9651>] :nfsd:nfsd+0x111/0x380
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff8020ab9c>] child_rip+0xa/0x12
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff883a9540>] :nfsd:nfsd+0x0/0x380
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff883a9540>] :nfsd:nfsd+0x0/0x380
>> Dec 8 06:38:21 ibd201 kernel: [<ffffffff8020ab92>] child_rip+0x0/0x12
>> Dec 8 06:38:21 ibd201 kernel:
>> Dec 8 06:38:21 ibd201 kernel:
>> Dec 8 06:38:21 ibd201 kernel: Code: 0f 0b 68 8c 41 45 80 c2 2c 01 f0 ff 4f 08
>> 0f 94 c0 84 c0 74
>> Dec 8 06:38:21 ibd201 kernel: RIP [<ffffffff8025dff7>] put_page+0x17/0x40
>> Dec 8 06:38:21 ibd201 kernel: RSP <ffff810219ddfb08>
>>
>> -vu
>>
More information about the general
mailing list