[ofa-general] page allocation failure

Bernd Schubert bs at q-leap.de
Thu Feb 28 09:42:19 PST 2008


Hello,

on several on our Lustre Servers we can see page allocation failures.

This is with 2.6.22 + kernel modules from ofed 1.2.5


[44464.764559] Lustre: 24052:0:(ldlm_lib.c:698:target_handle_connect()) Skipped 16 previous similar messages
[54132.351263] ib_cm/2: page allocation failure. order:0, mode:0x10d0
[54132.360738]
[54132.360741] Call Trace:
[54132.367803]  [<ffffffff8020ac61>] show_trace+0x34/0x47
[54132.373235]  [<ffffffff8020ac86>] dump_stack+0x12/0x17
[54132.378937]  [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
[54132.386180]  [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
[54132.395120]  [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
[54132.401651]  [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
[54132.408897]  [<ffffffff8809f9a9>] :ib_mthca:mthca_alloc_qp_common+0x246/0x4e5
[54132.418884]  [<ffffffff880a0c6d>] :ib_mthca:mthca_alloc_qp+0xab/0x102
[54132.425774]  [<ffffffff880a5217>] :ib_mthca:mthca_create_qp+0x126/0x281
[54132.432716]  [<ffffffff88054bc5>] :ib_core:ib_create_qp+0x17/0x91
[54132.439102]  [<ffffffff88161c9f>] :rdma_cm:rdma_create_qp+0x2d/0x153
[54132.446301]  [<ffffffff8835d0cc>] :ko2iblnd:kiblnd_create_conn+0x81c/0x1250
[54132.456992]  [<ffffffff88365295>] :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0
[54132.469847]  [<ffffffff88366975>] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
[54132.478821]  [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389
[54132.485637]  [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad
[54132.492182]  [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b
[54132.499236]  [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa
[54132.506690]  [<ffffffff80236291>] run_workqueue+0x7f/0x10b
[54132.512652]  [<ffffffff80236b1a>] worker_thread+0xda/0xe4
[54132.520136]  [<ffffffff8023959a>] kthread+0x47/0x75
[54132.525570]  [<ffffffff8020a2f8>] child_rip+0xa/0x12
[54132.532975]
[54132.535527] Mem-info:
[54132.538157] Node 0 DMA per-cpu:
[54132.542303] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.551752] CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.561661] CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.571154] CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.580597] CPU    4: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.592354] CPU    5: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.601794] CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.610719] CPU    7: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54132.619630] Node 0 DMA32 per-cpu:
[54132.623551] CPU    0: Hot: hi:  186, btch:  31 usd:  49   Cold: hi:   62, btch:  15 usd:  49
[54132.632691] CPU    1: Hot: hi:  186, btch:  31 usd:  26   Cold: hi:   62, btch:  15 usd:   3
[54132.642680] CPU    2: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  54
[54132.651897] CPU    3: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, btch:  15 usd:  13
[54132.663321] CPU    4: Hot: hi:  186, btch:  31 usd:  43   Cold: hi:   62, btch:  15 usd:  55
[54132.673282] CPU    5: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  49
[54132.683636] CPU    6: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:   1
[54132.693156] CPU    7: Hot: hi:  186, btch:  31 usd:  13   Cold: hi:   62, btch:  15 usd:  56
[54132.703412] Node 0 Normal per-cpu:
[54132.707024] CPU    0: Hot: hi:  186, btch:  31 usd: 130   Cold: hi:   62, btch:  15 usd:  14
[54132.719317] CPU    1: Hot: hi:  186, btch:  31 usd:  81   Cold: hi:   62, btch:  15 usd:   1
[54132.729276] CPU    2: Hot: hi:  186, btch:  31 usd: 134   Cold: hi:   62, btch:  15 usd:   2
[54132.738819] CPU    3: Hot: hi:  186, btch:  31 usd: 124   Cold: hi:   62, btch:  15 usd:   8
[54132.748078] CPU    4: Hot: hi:  186, btch:  31 usd:  21   Cold: hi:   62, btch:  15 usd:   4
[54132.758029] CPU    5: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:   9
[54132.766855] CPU    6: Hot: hi:  186, btch:  31 usd: 120   Cold: hi:   62, btch:  15 usd:  13
[54132.776462] CPU    7: Hot: hi:  186, btch:  31 usd: 166   Cold: hi:   62, btch:  15 usd:  12
[54132.786009] Active:28507 inactive:62701 dirty:8386 writeback:27 unstable:0
[54132.786010]  free:5586 slab:273528 mapped:2136 pagetables:699 bounce:0
[54132.803082] Node 0 DMA free:11192kB min:20kB low:24kB high:28kB active:0kB inactive:0kB present:10660kB pages_scanned:0 all_unreclaimable? yes
[54132.816507] lowmem_reserve[]: 0 3255 4013
[54132.820811] Node 0 DMA32 free:9812kB min:6564kB low:8204kB high:9844kB active:52536kB inactive:134508kB present:3333728kB pages_scanned:0 all_unreclaimable? no
[54132.839252] lowmem_reserve[]: 0 0 757
[54132.843205] Node 0 Normal free:1340kB min:1524kB low:1904kB high:2284kB active:61492kB inactive:116296kB present:775680kB pages_scanned:800 all_unreclaimable? no
[54132.859932] lowmem_reserve[]: 0 0 0
[54132.863784] Node 0 DMA: 6*4kB 4*8kB 4*16kB 4*32kB 3*64kB 0*128kB 2*256kB 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11192kB
[54132.876957] Node 0 DMA32: 48*4kB 33*8kB 26*16kB 3*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB = 9608kB
[54132.891138] Node 0 Normal: 0*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1456kB
[54132.903195] Swap cache: add 0, delete 0, find 0/0, race 0+0
[54132.909967] Free swap  = 4200888kB
[54132.913677] Total swap = 4200888kB
[54132.917229] Free swap:       4200888kB
[54132.967201] 1245184 pages of RAM
[54132.971121] 231685 reserved pages
[54132.974973] 58033 pages shared
[54132.978329] 0 pages swap cached
[54132.982267] LustreError: 4103:0:(o2iblnd.c:791:kiblnd_create_conn()) Can't create QP: -12
[54177.640441] ib_cm/5: page allocation failure. order:0, mode:0x10d0
[54177.648631]
[54177.648632] Call Trace:
[54177.653908]  [<ffffffff8020ac61>] show_trace+0x34/0x47
[54177.660073]  [<ffffffff8020ac86>] dump_stack+0x12/0x17
[54177.667176]  [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
[54177.682952]  [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
[54177.688811]  [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
[54177.695277]  [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
[54177.702683]  [<ffffffff8809c85f>] :ib_mthca:mthca_alloc_cq_buf+0x38/0x86
[54177.711034]  [<ffffffff8809d7f6>] :ib_mthca:mthca_init_cq+0x12a/0x397
[54177.718478]  [<ffffffff880a5462>] :ib_mthca:mthca_create_cq+0xf0/0x1be
[54177.725601]  [<ffffffff88054c66>] :ib_core:ib_create_cq+0x27/0x56
[54177.732384]  [<ffffffff8835cc60>] :ko2iblnd:kiblnd_create_conn+0x3b0/0x1250
[54177.739683]  [<ffffffff88365295>] :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0
[54177.748451]  [<ffffffff88366975>] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
[54177.757088]  [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389
[54177.763985]  [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad
[54177.770664]  [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b
[54177.777248]  [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa
[54177.784045]  [<ffffffff80236291>] run_workqueue+0x7f/0x10b
[54177.790439]  [<ffffffff80236b1a>] worker_thread+0xda/0xe4
[54177.799862]  [<ffffffff8023959a>] kthread+0x47/0x75
[54177.805672]  [<ffffffff8020a2f8>] child_rip+0xa/0x12
[54177.811717]
[54177.813851] Mem-info:
[54177.816666] Node 0 DMA per-cpu:
[54177.820479] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.829621] CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.839216] CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.849488] CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.859625] CPU    4: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.871977] CPU    5: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.881930] CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.891980] CPU    7: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[54177.902800] Node 0 DMA32 per-cpu:
[54177.906462] CPU    0: Hot: hi:  186, btch:  31 usd:  10   Cold: hi:   62, btch:  15 usd:  58
[54177.916162] CPU    1: Hot: hi:  186, btch:  31 usd:  26   Cold: hi:   62, btch:  15 usd:   3
[54177.926049] CPU    2: Hot: hi:  186, btch:  31 usd: 139   Cold: hi:   62, btch:  15 usd:  54
[54177.936948] CPU    3: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, btch:  15 usd:  13
[54177.946968] CPU    4: Hot: hi:  186, btch:  31 usd:  56   Cold: hi:   62, btch:  15 usd:  55
[54177.956868] CPU    5: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  57
[54177.965685] CPU    6: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:   1
[54177.975412] CPU    7: Hot: hi:  186, btch:  31 usd:  13   Cold: hi:   62, btch:  15 usd:  56
[54177.986045] Node 0 Normal per-cpu:
[54177.990527] CPU    0: Hot: hi:  186, btch:  31 usd: 128   Cold: hi:   62, btch:  15 usd:  14
[54178.002993] CPU    1: Hot: hi:  186, btch:  31 usd:  81   Cold: hi:   62, btch:  15 usd:   1
[54178.012136] CPU    2: Hot: hi:  186, btch:  31 usd: 113   Cold: hi:   62, btch:  15 usd:   2
[54178.022533] CPU    3: Hot: hi:  186, btch:  31 usd: 124   Cold: hi:   62, btch:  15 usd:   8
[54178.032316] CPU    4: Hot: hi:  186, btch:  31 usd:  27   Cold: hi:   62, btch:  15 usd:   4
[54178.041380] CPU    5: Hot: hi:  186, btch:  31 usd:  24   Cold: hi:   62, btch:  15 usd:   9
[54178.050941] CPU    6: Hot: hi:  186, btch:  31 usd: 120   Cold: hi:   62, btch:  15 usd:  13
[54178.061180] CPU    7: Hot: hi:  186, btch:  31 usd: 166   Cold: hi:   62, btch:  15 usd:  12
[54178.072162] Active:28319 inactive:62389 dirty:8381 writeback:27 unstable:0
[54178.072163]  free:5581 slab:273603 mapped:2117 pagetables:690 bounce:0
[54178.087805] Node 0 DMA free:11192kB min:20kB low:24kB high:28kB active:0kB inactive:0kB present:10660kB pages_scanned:0 all_unreclaimable? yes
[54178.103794] lowmem_reserve[]: 0 3255 4013
[54178.108294] Node 0 DMA32 free:9784kB min:6564kB low:8204kB high:9844kB active:51792kB inactive:133260kB present:3333728kB pages_scanned:0 all_unreclaimable? no
[54178.129648] lowmem_reserve[]: 0 0 757
[54178.133756] Node 0 Normal free:1348kB min:1524kB low:1904kB high:2284kB active:61484kB inactive:116296kB present:775680kB pages_scanned:728 all_unreclaimable? no
[54178.154399] lowmem_reserve[]: 0 0 0
[54178.158450] Node 0 DMA: 6*4kB 4*8kB 4*16kB 4*32kB 3*64kB 0*128kB 2*256kB 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11192kB
[54178.172214] Node 0 DMA32: 65*4kB 17*8kB 37*16kB 6*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB = 9628kB
[54178.188210] Node 0 Normal: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1464kB
[54178.202288] Swap cache: add 0, delete 0, find 0/0, race 0+0
[54178.208654] Free swap  = 4200888kB
[54178.212390] Total swap = 4200888kB
[54178.218597] Free swap:       4200888kB
[54178.264623] 1245184 pages of RAM
[54178.268302] 231685 reserved pages
[54178.271793] 57602 pages shared
[54178.275306] 0 pages swap cached
[54178.278778] LustreError: 4106:0:(o2iblnd.c:732:kiblnd_create_conn()) Can't create CQ: -12
[54277.772930] ib_cm/2: page allocation failure. order:0, mode:0x10d0
[54277.781944]
[54277.781945] Call Trace:
[54277.788321]  [<ffffffff8020ac61>] show_trace+0x34/0x47
[54277.793761]  [<ffffffff8020ac86>] dump_stack+0x12/0x17
[54277.799744]  [<ffffffff80251bc4>] __alloc_pages+0x2a3/0x2bc
[54277.806044]  [<ffffffff8020f75c>] dma_alloc_pages+0x9b/0xbf
[54277.814225]  [<ffffffff8020f7f6>] dma_alloc_coherent+0x76/0x1cc
[54277.821449]  [<ffffffff8809af1e>] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
[54277.831300]  [<ffffffff8809f9a9>] :ib_mthca:mthca_alloc_qp_common+0x246/0x4e5
[54277.838934]  [<ffffffff880a0c6d>] :ib_mthca:mthca_alloc_qp+0xab/0x102
[54277.846467]  [<ffffffff880a5217>] :ib_mthca:mthca_create_qp+0x126/0x281
[54277.854289]  [<ffffffff88054bc5>] :ib_core:ib_create_qp+0x17/0x91
[54277.862274]  [<ffffffff88161c9f>] :rdma_cm:rdma_create_qp+0x2d/0x153
[54277.870048]  [<ffffffff8835d0cc>] :ko2iblnd:kiblnd_create_conn+0x81c/0x1250
[54277.877973]  [<ffffffff88365295>] :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0
[54277.886679]  [<ffffffff88366975>] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
[54277.895646]  [<ffffffff881620e7>] :rdma_cm:cma_req_handler+0x322/0x389
[54277.903470]  [<ffffffff88155fa4>] :ib_cm:cm_process_work+0x17/0xad
[54277.910567]  [<ffffffff88157025>] :ib_cm:cm_req_handler+0x7ae/0x81b
[54277.918121]  [<ffffffff881570bf>] :ib_cm:cm_work_handler+0x2d/0xbaa
[54277.926378]  [<ffffffff80236291>] run_workqueue+0x7f/0x10b
[54277.932202]  [<ffffffff80236b1a>] worker_thread+0xda/0xe4
[54277.938003]  [<ffffffff8023959a>] kthread+0x47/0x75
[54277.944032]  [<ffffffff8020a2f8>] child_rip+0xa/0x12
[54277.950581]


Any ideas?

Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH



More information about the general mailing list