[ofa-general] page allocation failure
Bernd Schubert
bs at q-leap.de
Thu Feb 28 09:44:07 PST 2008
On Thursday 28 February 2008 18:42:19 Bernd Schubert wrote:
> Hello,
>
> on several on our Lustre Servers we can see page allocation failures.
>
> This is with 2.6.22 + kernel modules from ofed 1.2.5
Er, correction, it's 1.2.5.5
>
>
> [44464.764559] Lustre: 24052:0:(ldlm_lib.c:698:target_handle_connect())
> Skipped 16 previous similar messages [54132.351263] ib_cm/2: page
> allocation failure. order:0, mode:0x10d0 [54132.360738]
> [54132.360741] Call Trace:
> [54132.367803] [<ffffffff8020ac61>] show_trace+0x34/0x47
> [54132.373235] [<ffffffff8020ac86>] dump_stack+0x12/0x17
> [54132.378937] [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
> [54132.386180] [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
> [54132.395120] [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
> [54132.401651] [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
> [54132.408897] [<ffffffff8809f9a9>]
> :ib_mthca:mthca_alloc_qp_common+0x246/0x4e5 [54132.418884]
> [<ffffffff880a0c6d>] :ib_mthca:mthca_alloc_qp+0xab/0x102 [54132.425774]
> [<ffffffff880a5217>] :ib_mthca:mthca_create_qp+0x126/0x281 [54132.432716]
> [<ffffffff88054bc5>] :ib_core:ib_create_qp+0x17/0x91 [54132.439102]
> [<ffffffff88161c9f>] :rdma_cm:rdma_create_qp+0x2d/0x153 [54132.446301]
> [<ffffffff8835d0cc>] :ko2iblnd:kiblnd_create_conn+0x81c/0x1250
> [54132.456992] [<ffffffff88365295>]
> :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0 [54132.469847]
> [<ffffffff88366975>] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
> [54132.478821] [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389
> [54132.485637] [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad
> [54132.492182] [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b
> [54132.499236] [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa
> [54132.506690] [<ffffffff80236291>] run_workqueue+0x7f/0x10b
> [54132.512652] [<ffffffff80236b1a>] worker_thread+0xda/0xe4
> [54132.520136] [<ffffffff8023959a>] kthread+0x47/0x75
> [54132.525570] [<ffffffff8020a2f8>] child_rip+0xa/0x12
> [54132.532975]
> [54132.535527] Mem-info:
> [54132.538157] Node 0 DMA per-cpu:
> [54132.542303] CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi:
> 0, btch: 1 usd: 0 [54132.551752] CPU 1: Hot: hi: 0, btch: 1
> usd: 0 Cold: hi: 0, btch: 1 usd: 0 [54132.561661] CPU 2: Hot:
> hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> [54132.571154] CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi:
> 0, btch: 1 usd: 0 [54132.580597] CPU 4: Hot: hi: 0, btch: 1
> usd: 0 Cold: hi: 0, btch: 1 usd: 0 [54132.592354] CPU 5: Hot:
> hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> [54132.601794] CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi:
> 0, btch: 1 usd: 0 [54132.610719] CPU 7: Hot: hi: 0, btch: 1
> usd: 0 Cold: hi: 0, btch: 1 usd: 0 [54132.619630] Node 0 DMA32
> per-cpu:
> [54132.623551] CPU 0: Hot: hi: 186, btch: 31 usd: 49 Cold: hi:
> 62, btch: 15 usd: 49 [54132.632691] CPU 1: Hot: hi: 186, btch: 31
> usd: 26 Cold: hi: 62, btch: 15 usd: 3 [54132.642680] CPU 2: Hot:
> hi: 186, btch: 31 usd: 30 Cold: hi: 62, btch: 15 usd: 54
> [54132.651897] CPU 3: Hot: hi: 186, btch: 31 usd: 1 Cold: hi:
> 62, btch: 15 usd: 13 [54132.663321] CPU 4: Hot: hi: 186, btch: 31
> usd: 43 Cold: hi: 62, btch: 15 usd: 55 [54132.673282] CPU 5: Hot:
> hi: 186, btch: 31 usd: 30 Cold: hi: 62, btch: 15 usd: 49
> [54132.683636] CPU 6: Hot: hi: 186, btch: 31 usd: 25 Cold: hi:
> 62, btch: 15 usd: 1 [54132.693156] CPU 7: Hot: hi: 186, btch: 31
> usd: 13 Cold: hi: 62, btch: 15 usd: 56 [54132.703412] Node 0 Normal
> per-cpu:
> [54132.707024] CPU 0: Hot: hi: 186, btch: 31 usd: 130 Cold: hi:
> 62, btch: 15 usd: 14 [54132.719317] CPU 1: Hot: hi: 186, btch: 31
> usd: 81 Cold: hi: 62, btch: 15 usd: 1 [54132.729276] CPU 2: Hot:
> hi: 186, btch: 31 usd: 134 Cold: hi: 62, btch: 15 usd: 2
> [54132.738819] CPU 3: Hot: hi: 186, btch: 31 usd: 124 Cold: hi:
> 62, btch: 15 usd: 8 [54132.748078] CPU 4: Hot: hi: 186, btch: 31
> usd: 21 Cold: hi: 62, btch: 15 usd: 4 [54132.758029] CPU 5: Hot:
> hi: 186, btch: 31 usd: 30 Cold: hi: 62, btch: 15 usd: 9
> [54132.766855] CPU 6: Hot: hi: 186, btch: 31 usd: 120 Cold: hi:
> 62, btch: 15 usd: 13 [54132.776462] CPU 7: Hot: hi: 186, btch: 31
> usd: 166 Cold: hi: 62, btch: 15 usd: 12 [54132.786009] Active:28507
> inactive:62701 dirty:8386 writeback:27 unstable:0 [54132.786010] free:5586
> slab:273528 mapped:2136 pagetables:699 bounce:0 [54132.803082] Node 0 DMA
> free:11192kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
> present:10660kB pages_scanned:0 all_unreclaimable? yes [54132.816507]
> lowmem_reserve[]: 0 3255 4013
> [54132.820811] Node 0 DMA32 free:9812kB min:6564kB low:8204kB high:9844kB
> active:52536kB inactive:134508kB present:3333728kB pages_scanned:0
> all_unreclaimable? no [54132.839252] lowmem_reserve[]: 0 0 757
> [54132.843205] Node 0 Normal free:1340kB min:1524kB low:1904kB high:2284kB
> active:61492kB inactive:116296kB present:775680kB pages_scanned:800
> all_unreclaimable? no [54132.859932] lowmem_reserve[]: 0 0 0
> [54132.863784] Node 0 DMA: 6*4kB 4*8kB 4*16kB 4*32kB 3*64kB 0*128kB 2*256kB
> 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11192kB [54132.876957] Node 0 DMA32:
> 48*4kB 33*8kB 26*16kB 3*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB
> 0*2048kB 2*4096kB = 9608kB [54132.891138] Node 0 Normal: 0*4kB 0*8kB 1*16kB
> 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1456kB
> [54132.903195] Swap cache: add 0, delete 0, find 0/0, race 0+0
> [54132.909967] Free swap = 4200888kB
> [54132.913677] Total swap = 4200888kB
> [54132.917229] Free swap: 4200888kB
> [54132.967201] 1245184 pages of RAM
> [54132.971121] 231685 reserved pages
> [54132.974973] 58033 pages shared
> [54132.978329] 0 pages swap cached
> [54132.982267] LustreError: 4103:0:(o2iblnd.c:791:kiblnd_create_conn())
> Can't create QP: -12 [54177.640441] ib_cm/5: page allocation failure.
> order:0, mode:0x10d0 [54177.648631]
> [54177.648632] Call Trace:
> [54177.653908] [<ffffffff8020ac61>] show_trace+0x34/0x47
> [54177.660073] [<ffffffff8020ac86>] dump_stack+0x12/0x17
> [54177.667176] [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
> [54177.682952] [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
> [54177.688811] [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
> [54177.695277] [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
> [54177.702683] [<ffffffff8809c85f>] :ib_mthca:mthca_alloc_cq_buf+0x38/0x86
> [54177.711034] [<ffffffff8809d7f6>] :ib_mthca:mthca_init_cq+0x12a/0x397
> [54177.718478] [<ffffffff880a5462>] :ib_mthca:mthca_create_cq+0xf0/0x1be
> [54177.725601] [<ffffffff88054c66>] :ib_core:ib_create_cq+0x27/0x56
> [54177.732384] [<ffffffff8835cc60>]
> :ko2iblnd:kiblnd_create_conn+0x3b0/0x1250 [54177.739683]
> [<ffffffff88365295>] :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0
> [54177.748451] [<ffffffff88366975>]
> :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0 [54177.757088]
> [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389 [54177.763985]
> [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad [54177.770664]
> [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b [54177.777248]
> [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa [54177.784045]
> [<ffffffff80236291>] run_workqueue+0x7f/0x10b
> [54177.790439] [<ffffffff80236b1a>] worker_thread+0xda/0xe4
> [54177.799862] [<ffffffff8023959a>] kthread+0x47/0x75
> [54177.805672] [<ffffffff8020a2f8>] child_rip+0xa/0x12
> [54177.811717]
> [54177.813851] Mem-info:
> [54177.816666] Node 0 DMA per-cpu:
> [54177.820479] CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi:
> 0, btch: 1 usd: 0 [54177.829621] CPU 1: Hot: hi: 0, btch: 1
> usd: 0 Cold: hi: 0, btch: 1 usd: 0 [54177.839216] CPU 2: Hot:
> hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> [54177.849488] CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi:
> 0, btch: 1 usd: 0 [54177.859625] CPU 4: Hot: hi: 0, btch: 1
> usd: 0 Cold: hi: 0, btch: 1 usd: 0 [54177.871977] CPU 5: Hot:
> hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> [54177.881930] CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi:
> 0, btch: 1 usd: 0 [54177.891980] CPU 7: Hot: hi: 0, btch: 1
> usd: 0 Cold: hi: 0, btch: 1 usd: 0 [54177.902800] Node 0 DMA32
> per-cpu:
> [54177.906462] CPU 0: Hot: hi: 186, btch: 31 usd: 10 Cold: hi:
> 62, btch: 15 usd: 58 [54177.916162] CPU 1: Hot: hi: 186, btch: 31
> usd: 26 Cold: hi: 62, btch: 15 usd: 3 [54177.926049] CPU 2: Hot:
> hi: 186, btch: 31 usd: 139 Cold: hi: 62, btch: 15 usd: 54
> [54177.936948] CPU 3: Hot: hi: 186, btch: 31 usd: 1 Cold: hi:
> 62, btch: 15 usd: 13 [54177.946968] CPU 4: Hot: hi: 186, btch: 31
> usd: 56 Cold: hi: 62, btch: 15 usd: 55 [54177.956868] CPU 5: Hot:
> hi: 186, btch: 31 usd: 30 Cold: hi: 62, btch: 15 usd: 57
> [54177.965685] CPU 6: Hot: hi: 186, btch: 31 usd: 25 Cold: hi:
> 62, btch: 15 usd: 1 [54177.975412] CPU 7: Hot: hi: 186, btch: 31
> usd: 13 Cold: hi: 62, btch: 15 usd: 56 [54177.986045] Node 0 Normal
> per-cpu:
> [54177.990527] CPU 0: Hot: hi: 186, btch: 31 usd: 128 Cold: hi:
> 62, btch: 15 usd: 14 [54178.002993] CPU 1: Hot: hi: 186, btch: 31
> usd: 81 Cold: hi: 62, btch: 15 usd: 1 [54178.012136] CPU 2: Hot:
> hi: 186, btch: 31 usd: 113 Cold: hi: 62, btch: 15 usd: 2
> [54178.022533] CPU 3: Hot: hi: 186, btch: 31 usd: 124 Cold: hi:
> 62, btch: 15 usd: 8 [54178.032316] CPU 4: Hot: hi: 186, btch: 31
> usd: 27 Cold: hi: 62, btch: 15 usd: 4 [54178.041380] CPU 5: Hot:
> hi: 186, btch: 31 usd: 24 Cold: hi: 62, btch: 15 usd: 9
> [54178.050941] CPU 6: Hot: hi: 186, btch: 31 usd: 120 Cold: hi:
> 62, btch: 15 usd: 13 [54178.061180] CPU 7: Hot: hi: 186, btch: 31
> usd: 166 Cold: hi: 62, btch: 15 usd: 12 [54178.072162] Active:28319
> inactive:62389 dirty:8381 writeback:27 unstable:0 [54178.072163] free:5581
> slab:273603 mapped:2117 pagetables:690 bounce:0 [54178.087805] Node 0 DMA
> free:11192kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
> present:10660kB pages_scanned:0 all_unreclaimable? yes [54178.103794]
> lowmem_reserve[]: 0 3255 4013
> [54178.108294] Node 0 DMA32 free:9784kB min:6564kB low:8204kB high:9844kB
> active:51792kB inactive:133260kB present:3333728kB pages_scanned:0
> all_unreclaimable? no [54178.129648] lowmem_reserve[]: 0 0 757
> [54178.133756] Node 0 Normal free:1348kB min:1524kB low:1904kB high:2284kB
> active:61484kB inactive:116296kB present:775680kB pages_scanned:728
> all_unreclaimable? no [54178.154399] lowmem_reserve[]: 0 0 0
> [54178.158450] Node 0 DMA: 6*4kB 4*8kB 4*16kB 4*32kB 3*64kB 0*128kB 2*256kB
> 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11192kB [54178.172214] Node 0 DMA32:
> 65*4kB 17*8kB 37*16kB 6*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB
> 0*2048kB 2*4096kB = 9628kB [54178.188210] Node 0 Normal: 0*4kB 1*8kB 1*16kB
> 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1464kB
> [54178.202288] Swap cache: add 0, delete 0, find 0/0, race 0+0
> [54178.208654] Free swap = 4200888kB
> [54178.212390] Total swap = 4200888kB
> [54178.218597] Free swap: 4200888kB
> [54178.264623] 1245184 pages of RAM
> [54178.268302] 231685 reserved pages
> [54178.271793] 57602 pages shared
> [54178.275306] 0 pages swap cached
> [54178.278778] LustreError: 4106:0:(o2iblnd.c:732:kiblnd_create_conn())
> Can't create CQ: -12 [54277.772930] ib_cm/2: page allocation failure.
> order:0, mode:0x10d0 [54277.781944]
> [54277.781945] Call Trace:
> [54277.788321] [<ffffffff8020ac61>] show_trace+0x34/0x47
> [54277.793761] [<ffffffff8020ac86>] dump_stack+0x12/0x17
> [54277.799744] [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
> [54277.806044] [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
> [54277.814225] [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
> [54277.821449] [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
> [54277.831300] [<ffffffff8809f9a9>]
> :ib_mthca:mthca_alloc_qp_common+0x246/0x4e5 [54277.838934]
> [<ffffffff880a0c6d>] :ib_mthca:mthca_alloc_qp+0xab/0x102 [54277.846467]
> [<ffffffff880a5217>] :ib_mthca:mthca_create_qp+0x126/0x281 [54277.854289]
> [<ffffffff88054bc5>] :ib_core:ib_create_qp+0x17/0x91 [54277.862274]
> [<ffffffff88161c9f>] :rdma_cm:rdma_create_qp+0x2d/0x153 [54277.870048]
> [<ffffffff8835d0cc>] :ko2iblnd:kiblnd_create_conn+0x81c/0x1250
> [54277.877973] [<ffffffff88365295>]
> :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0 [54277.886679]
> [<ffffffff88366975>] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
> [54277.895646] [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389
> [54277.903470] [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad
> [54277.910567] [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b
> [54277.918121] [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa
> [54277.926378] [<ffffffff80236291>] run_workqueue+0x7f/0x10b
> [54277.932202] [<ffffffff80236b1a>] worker_thread+0xda/0xe4
> [54277.938003] [<ffffffff8023959a>] kthread+0x47/0x75
> [54277.944032] [<ffffffff8020a2f8>] child_rip+0xa/0x12
> [54277.950581]
>
>
> Any ideas?
>
> Thanks,
> Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
More information about the general
mailing list