[openib-general] nfsrdma server stop responding,

Vu Pham vuhuong at mellanox.com
Tue Dec 12 01:19:16 PST 2006


James,
   I hit another variation of put_page problem. I just ran 
iozone with 9 GB file size (both client and server machines 
have 8 GB of memory, dual woodcrest xeon cpus, 2.6.18.5 
kernel, nfsrdma release 7)

After this happened other nfsrdma clients can still do I/O 
to the server

-vu

> Hit *send* too soon - here is the objdump of swap.o
> 
> -vu
> 
> 
>> James Lentini wrote:
>>> A couple of questions Vu:
>>>
>>> What NFS-RDMA release are you using? This looks like release 7.
>>>
>>
>> Yes. I'm using release 7
>>
>>> Is this reproducible?
>>
>> I ran into it twice - I think that it may co-relate to openSM restart 
>> incident. I'll double check it and confirm
>>
>>
>>> What kernel version are you using?
>>
>> 2.6.18.5
>>
>>> What hardware is this on? It looks like x86-64 to me, which is fine. 
>>> I just want to be sure I know what I'm looking at. As many specifics 
>>> as possible is good (number of CPUs, hyperthreading, etc.)
>>>
>>
>> Dual woodcrest xeon based CPUs
>>
>>> Could you send the output of
>>> objdump -Slr /path/to/kernel/mm/swap.o
>>>
>>
>> I attached the objdump output here
>>
>>> Actually, just the put_page disassembly is all I want to see.
>>>
>>> Is there any more text available? Usually there is an explanation 
>>> given for an oops message (e.g. "Unable to handle kernel paging 
>>> request..").
>>>
>>
>> I did not see any oops text message. System was still responsive with 
>> ipoib ping or login
>>
>>
>>> I opened a bug at the NFS-RDMA SourceForge project to track this:
>>>
>>> http://sourceforge.net/tracker/index.php?func=detail&aid=1613201&group_id=97628&atid=618583 
>>>
>>
>> thanks for your help,
>>
>> -vu
>>
>>> Thanks for reporting this.
>>> james
>>>
>>> On Fri, 8 Dec 2006, Vu Pham wrote:
>>>
>>>> Hi James,
>>>>   I got these errors in server's /var/log/messages and then the 
>>>> server stop
>>>> responding to login, I/O...; however, the server is still up, ipoib 
>>>> is still
>>>> working
>>>>
>>>>
>>>> Dec  8 06:38:21 ibd201 kernel: RIP: 0010:[<ffffffff8025dff7>]
>>>> [<ffffffff8025dff7>] put_page+0x17/0x40
>>>> Dec  8 06:38:21 ibd201 kernel: RSP: 0018:ffff810219ddfb08  EFLAGS: 
>>>> 00010246
>>>> Dec  8 06:38:21 ibd201 kernel: RAX: 0000000000000000 RBX: 
>>>> 0000000000000001
>>>> RCX: 000000000003ffff
>>>> Dec  8 06:38:21 ibd201 kernel: RDX: 0000000000000000 RSI: 
>>>> 0000000000000001
>>>> RDI: ffff8102274e92f8
>>>> Dec  8 06:38:21 ibd201 kernel: RBP: ffff8101ab785000 R08: 
>>>> 0000000000000034
>>>> R09: 0000000000000000
>>>> Dec  8 06:38:21 ibd201 kernel: R10: 0000000000000000 R11: 
>>>> 0000000000000000
>>>> R12: ffff81020ef96800
>>>> Dec  8 06:38:21 ibd201 kernel: R13: ffff8101ab785000 R14: 
>>>> 0000000000000000
>>>> R15: ffff8102053ee890
>>>> Dec  8 06:38:21 ibd201 kernel: FS:  00002ad76b8acb00(0000)
>>>> GS:ffff81022066eb40(0000) knlGS:0000000000000000
>>>> Dec  8 06:38:21 ibd201 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>>>> 000000008005003b
>>>> Dec  8 06:38:21 ibd201 kernel: CR2: 00002aaaaabf1000 CR3: 
>>>> 000000021c22b000
>>>> CR4: 00000000000006e0
>>>> Dec  8 06:38:21 ibd201 kernel: Process nfsd (pid: 15038, threadinfo
>>>> ffff810219dde000, task ffff81020d87f0c0)
>>>> Dec  8 06:38:21 ibd201 kernel: Stack:  ffffffff8835e547 
>>>> ffff81020ef96968
>>>> ffff81020ef96800 ffff81020ef96958
>>>> Dec  8 06:38:21 ibd201 kernel:  ffffffff88360c72 000000010395dc90
>>>> ffffffff80424e05 0000000000000000
>>>> Dec  8 06:38:21 ibd201 kernel:  0000000000200200 000000010395dc90
>>>> ffffffff80239b90 ffff81020d87f0c0
>>>> Dec  8 06:38:21 ibd201 kernel: Call Trace:
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff8835e547>]
>>>> :sunrpc:svc_rdma_put_context+0x37/0xd0
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff88360c72>]
>>>> :sunrpc:svc_rdma_recvfrom+0x5a2/0x11e0
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80424e05>]
>>>> schedule_timeout+0x95/0xb0
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80239b90>] 
>>>> process_timeout+0x0/0x10
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80423c2d>]
>>>> wait_for_completion_timeout+0xcd/0x150
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80228db0>]
>>>> default_wake_function+0x0/0x10
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff881c1402>]
>>>> :ib_mthca:mthca_cmd_post+0x232/0x260
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80228db0>]
>>>> default_wake_function+0x0/0x10
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff802fac39>] 
>>>> __next_cpu+0x19/0x30
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80227dae>]
>>>> find_busiest_group+0x24e/0x6d0
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80424772>] 
>>>> thread_return+0x0/0xde
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff804263f8>]
>>>> _spin_unlock_irqrestore+0x8/0x10
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff8023a331>]
>>>> try_to_del_timer_sync+0x51/0x60
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff8023a34c>] 
>>>> del_timer_sync+0xc/0x20
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80424e05>]
>>>> schedule_timeout+0x95/0xb0
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff883559e6>]
>>>> :sunrpc:svc_recv+0x416/0x510
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80228db0>]
>>>> default_wake_function+0x0/0x10
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff80228db0>]
>>>> default_wake_function+0x0/0x10
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff883a9540>] 
>>>> :nfsd:nfsd+0x0/0x380
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff883a9651>] 
>>>> :nfsd:nfsd+0x111/0x380
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff8020ab9c>] child_rip+0xa/0x12
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff883a9540>] 
>>>> :nfsd:nfsd+0x0/0x380
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff883a9540>] 
>>>> :nfsd:nfsd+0x0/0x380
>>>> Dec  8 06:38:21 ibd201 kernel:  [<ffffffff8020ab92>] child_rip+0x0/0x12
>>>> Dec  8 06:38:21 ibd201 kernel:
>>>> Dec  8 06:38:21 ibd201 kernel:
>>>> Dec  8 06:38:21 ibd201 kernel: Code: 0f 0b 68 8c 41 45 80 c2 2c 01 
>>>> f0 ff 4f 08
>>>> 0f 94 c0 84 c0 74
>>>> Dec  8 06:38:21 ibd201 kernel: RIP  [<ffffffff8025dff7>] 
>>>> put_page+0x17/0x40
>>>> Dec  8 06:38:21 ibd201 kernel:  RSP <ffff810219ddfb08>
>>>>
>>>> -vu
>>>>
>>
>>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: messages.202
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20061212/2dbd0ff8/attachment.ksh>


More information about the general mailing list