[ofa-general] page allocation failure

Bernd Schubert bs at q-leap.de
Thu Feb 28 09:44:07 PST 2008


On Thursday 28 February 2008 18:42:19 Bernd Schubert wrote:
> Hello,
>
> on several on our Lustre Servers we can see page allocation failures.
>
> This is with 2.6.22 + kernel modules from ofed 1.2.5

Er, correction, it's 1.2.5.5

>
>
> [44464.764559] Lustre: 24052:0:(ldlm_lib.c:698:target_handle_connect())
> Skipped 16 previous similar messages [54132.351263] ib_cm/2: page
> allocation failure. order:0, mode:0x10d0 [54132.360738]
> [54132.360741] Call Trace:
> [54132.367803]  [<ffffffff8020ac61>] show_trace+0x34/0x47
> [54132.373235]  [<ffffffff8020ac86>] dump_stack+0x12/0x17
> [54132.378937]  [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
> [54132.386180]  [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
> [54132.395120]  [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
> [54132.401651]  [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
> [54132.408897]  [<ffffffff8809f9a9>]
> :ib_mthca:mthca_alloc_qp_common+0x246/0x4e5 [54132.418884] 
> [<ffffffff880a0c6d>] :ib_mthca:mthca_alloc_qp+0xab/0x102 [54132.425774] 
> [<ffffffff880a5217>] :ib_mthca:mthca_create_qp+0x126/0x281 [54132.432716] 
> [<ffffffff88054bc5>] :ib_core:ib_create_qp+0x17/0x91 [54132.439102] 
> [<ffffffff88161c9f>] :rdma_cm:rdma_create_qp+0x2d/0x153 [54132.446301] 
> [<ffffffff8835d0cc>] :ko2iblnd:kiblnd_create_conn+0x81c/0x1250
> [54132.456992]  [<ffffffff88365295>]
> :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0 [54132.469847] 
> [<ffffffff88366975>] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
> [54132.478821]  [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389
> [54132.485637]  [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad
> [54132.492182]  [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b
> [54132.499236]  [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa
> [54132.506690]  [<ffffffff80236291>] run_workqueue+0x7f/0x10b
> [54132.512652]  [<ffffffff80236b1a>] worker_thread+0xda/0xe4
> [54132.520136]  [<ffffffff8023959a>] kthread+0x47/0x75
> [54132.525570]  [<ffffffff8020a2f8>] child_rip+0xa/0x12
> [54132.532975]
> [54132.535527] Mem-info:
> [54132.538157] Node 0 DMA per-cpu:
> [54132.542303] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:   
> 0, btch:   1 usd:   0 [54132.551752] CPU    1: Hot: hi:    0, btch:   1
> usd:   0   Cold: hi:    0, btch:   1 usd:   0 [54132.561661] CPU    2: Hot:
> hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> [54132.571154] CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:   
> 0, btch:   1 usd:   0 [54132.580597] CPU    4: Hot: hi:    0, btch:   1
> usd:   0   Cold: hi:    0, btch:   1 usd:   0 [54132.592354] CPU    5: Hot:
> hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> [54132.601794] CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:   
> 0, btch:   1 usd:   0 [54132.610719] CPU    7: Hot: hi:    0, btch:   1
> usd:   0   Cold: hi:    0, btch:   1 usd:   0 [54132.619630] Node 0 DMA32
> per-cpu:
> [54132.623551] CPU    0: Hot: hi:  186, btch:  31 usd:  49   Cold: hi:  
> 62, btch:  15 usd:  49 [54132.632691] CPU    1: Hot: hi:  186, btch:  31
> usd:  26   Cold: hi:   62, btch:  15 usd:   3 [54132.642680] CPU    2: Hot:
> hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  54
> [54132.651897] CPU    3: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:  
> 62, btch:  15 usd:  13 [54132.663321] CPU    4: Hot: hi:  186, btch:  31
> usd:  43   Cold: hi:   62, btch:  15 usd:  55 [54132.673282] CPU    5: Hot:
> hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  49
> [54132.683636] CPU    6: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:  
> 62, btch:  15 usd:   1 [54132.693156] CPU    7: Hot: hi:  186, btch:  31
> usd:  13   Cold: hi:   62, btch:  15 usd:  56 [54132.703412] Node 0 Normal
> per-cpu:
> [54132.707024] CPU    0: Hot: hi:  186, btch:  31 usd: 130   Cold: hi:  
> 62, btch:  15 usd:  14 [54132.719317] CPU    1: Hot: hi:  186, btch:  31
> usd:  81   Cold: hi:   62, btch:  15 usd:   1 [54132.729276] CPU    2: Hot:
> hi:  186, btch:  31 usd: 134   Cold: hi:   62, btch:  15 usd:   2
> [54132.738819] CPU    3: Hot: hi:  186, btch:  31 usd: 124   Cold: hi:  
> 62, btch:  15 usd:   8 [54132.748078] CPU    4: Hot: hi:  186, btch:  31
> usd:  21   Cold: hi:   62, btch:  15 usd:   4 [54132.758029] CPU    5: Hot:
> hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:   9
> [54132.766855] CPU    6: Hot: hi:  186, btch:  31 usd: 120   Cold: hi:  
> 62, btch:  15 usd:  13 [54132.776462] CPU    7: Hot: hi:  186, btch:  31
> usd: 166   Cold: hi:   62, btch:  15 usd:  12 [54132.786009] Active:28507
> inactive:62701 dirty:8386 writeback:27 unstable:0 [54132.786010]  free:5586
> slab:273528 mapped:2136 pagetables:699 bounce:0 [54132.803082] Node 0 DMA
> free:11192kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
> present:10660kB pages_scanned:0 all_unreclaimable? yes [54132.816507]
> lowmem_reserve[]: 0 3255 4013
> [54132.820811] Node 0 DMA32 free:9812kB min:6564kB low:8204kB high:9844kB
> active:52536kB inactive:134508kB present:3333728kB pages_scanned:0
> all_unreclaimable? no [54132.839252] lowmem_reserve[]: 0 0 757
> [54132.843205] Node 0 Normal free:1340kB min:1524kB low:1904kB high:2284kB
> active:61492kB inactive:116296kB present:775680kB pages_scanned:800
> all_unreclaimable? no [54132.859932] lowmem_reserve[]: 0 0 0
> [54132.863784] Node 0 DMA: 6*4kB 4*8kB 4*16kB 4*32kB 3*64kB 0*128kB 2*256kB
> 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11192kB [54132.876957] Node 0 DMA32:
> 48*4kB 33*8kB 26*16kB 3*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB
> 0*2048kB 2*4096kB = 9608kB [54132.891138] Node 0 Normal: 0*4kB 0*8kB 1*16kB
> 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1456kB
> [54132.903195] Swap cache: add 0, delete 0, find 0/0, race 0+0
> [54132.909967] Free swap  = 4200888kB
> [54132.913677] Total swap = 4200888kB
> [54132.917229] Free swap:       4200888kB
> [54132.967201] 1245184 pages of RAM
> [54132.971121] 231685 reserved pages
> [54132.974973] 58033 pages shared
> [54132.978329] 0 pages swap cached
> [54132.982267] LustreError: 4103:0:(o2iblnd.c:791:kiblnd_create_conn())
> Can't create QP: -12 [54177.640441] ib_cm/5: page allocation failure.
> order:0, mode:0x10d0 [54177.648631]
> [54177.648632] Call Trace:
> [54177.653908]  [<ffffffff8020ac61>] show_trace+0x34/0x47
> [54177.660073]  [<ffffffff8020ac86>] dump_stack+0x12/0x17
> [54177.667176]  [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
> [54177.682952]  [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
> [54177.688811]  [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
> [54177.695277]  [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
> [54177.702683]  [<ffffffff8809c85f>] :ib_mthca:mthca_alloc_cq_buf+0x38/0x86
> [54177.711034]  [<ffffffff8809d7f6>] :ib_mthca:mthca_init_cq+0x12a/0x397
> [54177.718478]  [<ffffffff880a5462>] :ib_mthca:mthca_create_cq+0xf0/0x1be
> [54177.725601]  [<ffffffff88054c66>] :ib_core:ib_create_cq+0x27/0x56
> [54177.732384]  [<ffffffff8835cc60>]
> :ko2iblnd:kiblnd_create_conn+0x3b0/0x1250 [54177.739683] 
> [<ffffffff88365295>] :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0
> [54177.748451]  [<ffffffff88366975>]
> :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0 [54177.757088] 
> [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389 [54177.763985] 
> [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad [54177.770664] 
> [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b [54177.777248] 
> [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa [54177.784045] 
> [<ffffffff80236291>] run_workqueue+0x7f/0x10b
> [54177.790439]  [<ffffffff80236b1a>] worker_thread+0xda/0xe4
> [54177.799862]  [<ffffffff8023959a>] kthread+0x47/0x75
> [54177.805672]  [<ffffffff8020a2f8>] child_rip+0xa/0x12
> [54177.811717]
> [54177.813851] Mem-info:
> [54177.816666] Node 0 DMA per-cpu:
> [54177.820479] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:   
> 0, btch:   1 usd:   0 [54177.829621] CPU    1: Hot: hi:    0, btch:   1
> usd:   0   Cold: hi:    0, btch:   1 usd:   0 [54177.839216] CPU    2: Hot:
> hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> [54177.849488] CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:   
> 0, btch:   1 usd:   0 [54177.859625] CPU    4: Hot: hi:    0, btch:   1
> usd:   0   Cold: hi:    0, btch:   1 usd:   0 [54177.871977] CPU    5: Hot:
> hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> [54177.881930] CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:   
> 0, btch:   1 usd:   0 [54177.891980] CPU    7: Hot: hi:    0, btch:   1
> usd:   0   Cold: hi:    0, btch:   1 usd:   0 [54177.902800] Node 0 DMA32
> per-cpu:
> [54177.906462] CPU    0: Hot: hi:  186, btch:  31 usd:  10   Cold: hi:  
> 62, btch:  15 usd:  58 [54177.916162] CPU    1: Hot: hi:  186, btch:  31
> usd:  26   Cold: hi:   62, btch:  15 usd:   3 [54177.926049] CPU    2: Hot:
> hi:  186, btch:  31 usd: 139   Cold: hi:   62, btch:  15 usd:  54
> [54177.936948] CPU    3: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:  
> 62, btch:  15 usd:  13 [54177.946968] CPU    4: Hot: hi:  186, btch:  31
> usd:  56   Cold: hi:   62, btch:  15 usd:  55 [54177.956868] CPU    5: Hot:
> hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  57
> [54177.965685] CPU    6: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:  
> 62, btch:  15 usd:   1 [54177.975412] CPU    7: Hot: hi:  186, btch:  31
> usd:  13   Cold: hi:   62, btch:  15 usd:  56 [54177.986045] Node 0 Normal
> per-cpu:
> [54177.990527] CPU    0: Hot: hi:  186, btch:  31 usd: 128   Cold: hi:  
> 62, btch:  15 usd:  14 [54178.002993] CPU    1: Hot: hi:  186, btch:  31
> usd:  81   Cold: hi:   62, btch:  15 usd:   1 [54178.012136] CPU    2: Hot:
> hi:  186, btch:  31 usd: 113   Cold: hi:   62, btch:  15 usd:   2
> [54178.022533] CPU    3: Hot: hi:  186, btch:  31 usd: 124   Cold: hi:  
> 62, btch:  15 usd:   8 [54178.032316] CPU    4: Hot: hi:  186, btch:  31
> usd:  27   Cold: hi:   62, btch:  15 usd:   4 [54178.041380] CPU    5: Hot:
> hi:  186, btch:  31 usd:  24   Cold: hi:   62, btch:  15 usd:   9
> [54178.050941] CPU    6: Hot: hi:  186, btch:  31 usd: 120   Cold: hi:  
> 62, btch:  15 usd:  13 [54178.061180] CPU    7: Hot: hi:  186, btch:  31
> usd: 166   Cold: hi:   62, btch:  15 usd:  12 [54178.072162] Active:28319
> inactive:62389 dirty:8381 writeback:27 unstable:0 [54178.072163]  free:5581
> slab:273603 mapped:2117 pagetables:690 bounce:0 [54178.087805] Node 0 DMA
> free:11192kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
> present:10660kB pages_scanned:0 all_unreclaimable? yes [54178.103794]
> lowmem_reserve[]: 0 3255 4013
> [54178.108294] Node 0 DMA32 free:9784kB min:6564kB low:8204kB high:9844kB
> active:51792kB inactive:133260kB present:3333728kB pages_scanned:0
> all_unreclaimable? no [54178.129648] lowmem_reserve[]: 0 0 757
> [54178.133756] Node 0 Normal free:1348kB min:1524kB low:1904kB high:2284kB
> active:61484kB inactive:116296kB present:775680kB pages_scanned:728
> all_unreclaimable? no [54178.154399] lowmem_reserve[]: 0 0 0
> [54178.158450] Node 0 DMA: 6*4kB 4*8kB 4*16kB 4*32kB 3*64kB 0*128kB 2*256kB
> 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11192kB [54178.172214] Node 0 DMA32:
> 65*4kB 17*8kB 37*16kB 6*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB
> 0*2048kB 2*4096kB = 9628kB [54178.188210] Node 0 Normal: 0*4kB 1*8kB 1*16kB
> 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1464kB
> [54178.202288] Swap cache: add 0, delete 0, find 0/0, race 0+0
> [54178.208654] Free swap  = 4200888kB
> [54178.212390] Total swap = 4200888kB
> [54178.218597] Free swap:       4200888kB
> [54178.264623] 1245184 pages of RAM
> [54178.268302] 231685 reserved pages
> [54178.271793] 57602 pages shared
> [54178.275306] 0 pages swap cached
> [54178.278778] LustreError: 4106:0:(o2iblnd.c:732:kiblnd_create_conn())
> Can't create CQ: -12 [54277.772930] ib_cm/2: page allocation failure.
> order:0, mode:0x10d0 [54277.781944]
> [54277.781945] Call Trace:
> [54277.788321]  [<ffffffff8020ac61>] show_trace+0x34/0x47
> [54277.793761]  [<ffffffff8020ac86>] dump_stack+0x12/0x17
> [54277.799744]  [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
> [54277.806044]  [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
> [54277.814225]  [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
> [54277.821449]  [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
> [54277.831300]  [<ffffffff8809f9a9>]
> :ib_mthca:mthca_alloc_qp_common+0x246/0x4e5 [54277.838934] 
> [<ffffffff880a0c6d>] :ib_mthca:mthca_alloc_qp+0xab/0x102 [54277.846467] 
> [<ffffffff880a5217>] :ib_mthca:mthca_create_qp+0x126/0x281 [54277.854289] 
> [<ffffffff88054bc5>] :ib_core:ib_create_qp+0x17/0x91 [54277.862274] 
> [<ffffffff88161c9f>] :rdma_cm:rdma_create_qp+0x2d/0x153 [54277.870048] 
> [<ffffffff8835d0cc>] :ko2iblnd:kiblnd_create_conn+0x81c/0x1250
> [54277.877973]  [<ffffffff88365295>]
> :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0 [54277.886679] 
> [<ffffffff88366975>] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
> [54277.895646]  [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389
> [54277.903470]  [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad
> [54277.910567]  [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b
> [54277.918121]  [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa
> [54277.926378]  [<ffffffff80236291>] run_workqueue+0x7f/0x10b
> [54277.932202]  [<ffffffff80236b1a>] worker_thread+0xda/0xe4
> [54277.938003]  [<ffffffff8023959a>] kthread+0x47/0x75
> [54277.944032]  [<ffffffff8020a2f8>] child_rip+0xa/0x12
> [54277.950581]
>
>
> Any ideas?
>
> Thanks,
> Bernd



-- 
Bernd Schubert
Q-Leap Networks GmbH



More information about the general mailing list