[openib-general] Re: Page allocation failures & kdapltest oops

Hal Rosenstock halr at voltaire.com
Wed Sep 28 06:39:13 PDT 2005


On Tue, 2005-09-27 at 12:53, James Lentini wrote:
> On Tue, 27 Sep 2005, Hal Rosenstock wrote:
> 
> > > Since we don't check for a kmalloc failure in DT_Tdep_PT_Printf, this 
> > > oops occurs:
> > > 
> > > > Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer 
> > > > dereference at virtual address 00000004
> > > 
> > > I've checked in the patch below to fix that, but this is not the root 
> > > of the problem. 
> > 
> > I'll try it with the patch and let you know how it behaves. When it
> > still runs out of memory will it fail more gracefully ? I understand it
> > won't fix the root cause of running out of memory.
> 
> It should behave more gracefully. Thanks for testing.

That seems better but I still see the following:

Sep 28 09:33:07 hal kernel: teback:0 unstable:0 free:420 slab:29838 mapped:28019 pagetables:487
Sep 28 09:33:07 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:3560kB inactive:1596kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 240 240
Sep 28 09:33:07 hal kernel: Normal free:672kB min:1920kB low:2400kB high:2880kB active:90152kB inactive:25992kB present:245760kB pages_scanned:91 all_unreclaimable? no
Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 0 0
Sep 28 09:33:07 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 0 0
Sep 28 09:33:07 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB
Sep 28 09:33:07 hal kernel: Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 672kB
Sep 28 09:33:07 hal kernel: HighMem: empty
Sep 28 09:33:07 hal kernel: Swap cache: add 19875, delete 15824, find 4973/5982, race 0+0
Sep 28 09:33:07 hal kernel: Free swap  = 483892kB
Sep 28 09:33:07 hal kernel: Total swap = 522104kB
Sep 28 09:33:07 hal kernel: Free swap:       483892kB
Sep 28 09:33:09 hal kernel: 65536 pages of RAM
Sep 28 09:33:10 hal kernel: 0 pages of HIGHMEM
Sep 28 09:33:10 hal kernel: 1533 reserved pages
Sep 28 09:33:11 hal kernel: 47248 pages shared
Sep 28 09:33:11 hal kernel: 4051 pages swap cached
Sep 28 09:33:11 hal kernel: 0 pages dirty
Sep 28 09:33:11 hal kernel: 0 pages writeback
Sep 28 09:33:11 hal kernel: 28019 pages mapped
Sep 28 09:33:11 hal kernel: 29838 pages slab
Sep 28 09:33:11 hal kernel: 487 pages pagetables
Sep 28 09:33:11 hal kernel: DT_Tdep_PT_Printf: out of memory
Sep 28 09:33:11 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20
Sep 28 09:33:11 hal kernel:  [<c014d512>] __alloc_pages+0x2f2/0x490
Sep 28 09:33:11 hal kernel:  [<c0151001>] kmem_getpages+0x31/0xb0
Sep 28 09:33:11 hal kernel:  [<c0152a79>] cache_grow+0x139/0x360
Sep 28 09:33:11 hal kernel:  [<c0153251>] cache_alloc_refill+0x151/0x340
Sep 28 09:33:11 hal kernel:  [<d0ace21a>] DT_handle_send_op+0x2fa/0x400 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<c0153b44>] __kmalloc+0xb4/0xf0
Sep 28 09:33:11 hal kernel:  [<d0ad86d5>] DT_Mdep_Malloc+0x25/0x60 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<d0ad9566>] DT_Tdep_PT_Printf+0x16/0x1d0 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<d0acc9f8>] DT_Transaction_Run+0x2c8/0xb60 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<d0ad888d>] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<d0ad888d>] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<d0acb918>] DT_Transaction_Main+0x1388/0x21a0 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<c0113e3d>] __change_page_attr+0x2d/0x170
Sep 28 09:33:11 hal kernel:  [<c0152eb6>] cache_free_debugcheck+0x196/0x2d0
Sep 28 09:33:11 hal kernel:  [<d0ad878f>] DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<d0ad8770>] DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
Sep 28 09:33:11 hal kernel:  [<c0100f75>] kernel_thread_helper+0x5/0x10
Sep 28 09:33:11 hal kernel: DMA per-cpu:
Sep 28 09:33:11 hal kernel: cpu 0 hot: low 2, high 6, batch 1 used:2
Sep 28 09:33:11 hal kernel: cpu 0 cold: low 0, high 2, batch 1 used:1
Sep 28 09:33:11 hal kernel: Normal per-cpu:
Sep 28 09:33:11 hal kernel: cpu 0 hot: low 62, high 186, batch 31 used:92
Sep 28 09:33:11 hal kernel: cpu 0 cold: low 0, high 62, batch 31 used:34
Sep 28 09:33:11 hal kernel: HighMem per-cpu: empty
Sep 28 09:33:11 hal kernel: Free pages:        1680kB (0kB HighMem)
Sep 28 09:33:11 hal kernel: Active:23428 inactive:6897 dirty:0 writeback:0 unstable:0 free:420 slab:29838 mapped:28019 pagetables:487
Sep 28 09:33:11 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:3560kB inactive:1596kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 240 240
Sep 28 09:33:11 hal kernel: Normal free:672kB min:1920kB low:2400kB high:2880kB active:90152kB inactive:25992kB present:245760kB pages_scanned:91 all_unreclaimable? no
Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 0 0
Sep 28 09:33:11 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 0 0
Sep 28 09:33:11 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB
Sep 28 09:33:11 hal kernel: Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 672kB
Sep 28 09:33:11 hal kernel: HighMem: empty
Sep 28 09:33:11 hal kernel: Swap cache: add 19875, delete 15824, find 4973/5982, race 0+0
Sep 28 09:33:11 hal kernel: Free swap  = 483892kB
Sep 28 09:33:11 hal kernel: Total swap = 522104kB
Sep 28 09:33:11 hal kernel: Free swap:       483892kB
Sep 28 09:33:11 hal kernel: 65536 pages of RAM
Sep 28 09:33:11 hal kernel: 0 pages of HIGHMEM
Sep 28 09:33:12 hal kernel: 1533 reserved pages
Sep 28 09:33:12 hal kernel: 47248 pages shared
Sep 28 09:33:12 hal kernel: 4051 pages swap cached
Sep 28 09:33:12 hal kernel: 0 pages dirty
Sep 28 09:33:12 hal kernel: 0 pages writeback
Sep 28 09:33:12 hal kernel: 28019 pages mapped
Sep 28 09:33:12 hal kernel: 29838 pages slab
Sep 28 09:33:12 hal kernel: 487 pages pagetables
Sep 28 09:33:12 hal kernel: DT_Tdep_PT_Printf: out of memory
Sep 28 09:33:12 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20
Sep 28 09:33:12 hal kernel:  [<c014d512>] __alloc_pages+0x2f2/0x490
Sep 28 09:33:12 hal kernel:  [<c0151001>] kmem_getpages+0x31/0xb0
Sep 28 09:33:12 hal kernel:  [<c0152a79>] cache_grow+0x139/0x360
Sep 28 09:33:12 hal kernel:  [<c022981b>] vscnprintf+0x2b/0x40
Sep 28 09:33:12 hal kernel:  [<c0153251>] cache_alloc_refill+0x151/0x340
Sep 28 09:33:12 hal kernel:  [<c0153b44>] __kmalloc+0xb4/0xf0
Sep 28 09:33:12 hal kernel:  [<d0ad86d5>] DT_Mdep_Malloc+0x25/0x60 [kdapltest]
Sep 28 09:33:12 hal kernel:  [<d0ad86d5>] DT_Mdep_Malloc+0x25/0x60 [kdapltest]
Sep 28 09:33:12 hal kernel: <of memory
Sep 28 09:33:12 hal kernel: DT_Tdep_PT_Printf: out of memory
Sep 28 09:33:12 hal last message repeated 439 times

Also, I don't understand why:
kdapltest -T T -s <IP> -D mthca0a -d -t 2 -w 8 -i 20 client SR server SR
would work and
kdapltest -T T -s <IP> -D mthca0a -d -i 10000 -w 8 client SR server SR
would fail. It seems the former is more strenuous (everything same but 2
threads and less iterations).

-- Hal




More information about the general mailing list