[ofa-general] OOM problem with ib_ipoib?

John Marshall John.Marshall at ec.gc.ca
Wed Oct 22 11:16:26 PDT 2008


Hi,

Summary: I believe I have been having an OOM problem caused by the
	ib_ipoib module. I do not see the problem until it is
	loaded. The problem manifests itself when the kernel cache
	(grep Cached /proc/meminfo) containing file data is maxed
	out. Normally, the cached data should be written out and
	released by pdflush. In this case, it is not.

	Notes:
	1) it is NOT necessary for the ib interfaces to actually
	be used or up!
	2) I am using ofed 1.3.2 which I have built on my own
	machine.
	3) I have similar weird behavior when using 1.4-rc3
	and a 2.6.26 kernel.

----------

System info:

root# lsmod | grep ib
ib_ipoib               77512  0
ib_cm                  33260  1 ib_ipoib
ib_sa                  36628  2 ib_ipoib,ib_cm
ib_mthca              124832  0
ib_umad                16232  0
ib_uverbs              38792  0
ib_mad                 35188  4 ib_cm,ib_sa,ib_mthca,ib_umad
ib_core                54304  7 ib_ipoib,ib_cm,ib_sa,ib_mthca,ib_umad,ib_uverbs,ib_mad
ipv6                  242980  29 ib_ipoib
libata                145584  1 ata_generic
scsi_mod              142316  6 sg,sr_mod,usb_storage,sd_mod,megaraid_sas,libata

root# uname -r
2.6.24-etchnhalf.1-686-bigmem

root# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8220
stepping        : 3
cpu MHz         : 2793.163
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy ts fid vid ttp tm stc
bogomips        : 5589.70
clflush size    : 64

***** 7 more similar entries (2 cpu, 4-core each) ****

root# cat /proc/meminfo
cat /proc/meminfo
MemTotal:     33274492 kB
MemFree:        147716 kB
Buffers:           840 kB
Cached:       32532792 kB
SwapCached:          0 kB
Active:          19956 kB
Inactive:     32524692 kB
HighTotal:    32635808 kB
HighFree:        77008 kB
LowTotal:       638684 kB
LowFree:         70708 kB
SwapTotal:    16386260 kB
SwapFree:     16386168 kB
Dirty:              88 kB
Writeback:           0 kB
AnonPages:       11032 kB
Mapped:           7940 kB
Slab:           537012 kB
SReclaimable:   487100 kB
SUnreclaim:      49912 kB
PageTables:        656 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  33023504 kB
Committed_AS:    61360 kB
VmallocTotal:   118776 kB
VmallocUsed:     96800 kB
VmallocChunk:    13112 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB

# dpkg -l |grep ofed
ii  libibcm                                     1.0.2-1                                  ofed-1.3.2: libibcm
ii  libibcommon                                 1.0.8-1                                  ofed-1.3.2: libibcommon
ii  libibmad                                    1.1.6-1                                  ofed-1.3.2: libibmad
ii  libibumad                                   1.1.7-1                                  ofed-1.3.2: libibumad
ii  libibverbs                                  1.1.1-1                                  ofed-1.3.2: libibverbs
ii  libipathverbs                               1.1-1                                    ofed-1.3.2: libipathverbs
ii  libmlx4                                     1.0-1                                    ofed-1.3.2: libmlx
ii  libmthca                                    1.0.4-1                                  ofed-1.3.2: libmthca
ii  librdmacm                                   1.0.7-1                                  ofed-1.3.2: librdmacm
ii  libsdp                                      1.1.99-1                                 ofed-1.3.2: libsdp
ii  ofa-kernel                                  1.3.2-2.6.24-etchnhalf.1-686-bigmem-1    ofed-1.3.2: ofa_kernel

----------

How to provoke #1 (prior to loading ib_ipoib):

non-root$ dd if=/dev/zero of=/tmp/50G bs=1M count=50000

root# modprobe ib_ipoib

Output from dmesg:

modprobe: page allocation failure. order:1, mode:0x20
Pid: 6839, comm: modprobe Not tainted 2.6.24-etchnhalf.1-686-bigmem #1
 [<c0161904>] __alloc_pages+0x2c4/0x2d5
 [<c017a05c>] cache_alloc_refill+0x299/0x4b1
 [<c017a2e9>] __kmalloc+0x75/0xbc
 [<c025eafb>] __alloc_skb+0x49/0xf5
 [<f8d4677f>] ipoib_cm_alloc_rx_skb+0x31/0x218 [ib_ipoib]
 [<f8d48c09>] ipoib_cm_dev_init+0x50c/0x552 [ib_ipoib]
 [<c0249944>] dma_pool_free+0xb0/0x18c
 [<f8d45bed>] ipoib_transport_dev_init+0xd2/0x3d1 [ib_ipoib]
 [<f8d42c6d>] ipoib_ib_dev_init+0x2c/0x6e [ib_ipoib]
 [<f8d3f7b3>] ipoib_dev_init+0xab/0xd0 [ib_ipoib]
 [<f8d3f9f8>] ipoib_add_one+0x220/0x3cf [ib_ipoib]
 [<c011fef8>] resched_task+0x52/0x54
 [<f89b13e2>] ib_register_client+0x48/0x6c [ib_core]
 [<f89890d2>] ipoib_init_module+0xd2/0xf8 [ib_ipoib]
 [<c0145a27>] sys_init_module+0x15e3/0x16fb
 [<c0166432>] vma_prio_tree_insert+0x17/0x2a
 [<c017a274>] __kmalloc+0x0/0xbc
 [<c0103ede>] syscall_call+0x7/0xb
 =======================
Mem-info:
DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    4: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    5: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    7: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 121   Cold: hi:   62, btch:  15 usd:  58
CPU    1: Hot: hi:  186, btch:  31 usd:  42   Cold: hi:   62, btch:  15 usd:  26
CPU    2: Hot: hi:  186, btch:  31 usd: 152   Cold: hi:   62, btch:  15 usd:  57
CPU    3: Hot: hi:  186, btch:  31 usd:  63   Cold: hi:   62, btch:  15 usd:  59
CPU    4: Hot: hi:  186, btch:  31 usd:  72   Cold: hi:   62, btch:  15 usd:  55
CPU    5: Hot: hi:  186, btch:  31 usd: 174   Cold: hi:   62, btch:  15 usd:  61
CPU    6: Hot: hi:  186, btch:  31 usd:  66   Cold: hi:   62, btch:  15 usd:  48
CPU    7: Hot: hi:  186, btch:  31 usd:  35   Cold: hi:   62, btch:  15 usd:  54
HighMem per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:   9
CPU    1: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:   5
CPU    2: Hot: hi:  186, btch:  31 usd:  93   Cold: hi:   62, btch:  15 usd:   8
CPU    3: Hot: hi:  186, btch:  31 usd:   3   Cold: hi:   62, btch:  15 usd:  14
CPU    4: Hot: hi:  186, btch:  31 usd:  37   Cold: hi:   62, btch:  15 usd:  53
CPU    5: Hot: hi:  186, btch:  31 usd:  67   Cold: hi:   62, btch:  15 usd:  49
CPU    6: Hot: hi:  186, btch:  31 usd:  15   Cold: hi:   62, btch:  15 usd:  30
CPU    7: Hot: hi:  186, btch:  31 usd: 138   Cold: hi:   62, btch:  15 usd:  61
Active:5136 inactive:8135705 dirty:12 writeback:0 unstable:0
 free:15715 slab:136280 mapped:2348 pagetables:164 bounce:0
DMA free:3524kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 873 34020 34020
Normal free:1368kB min:3744kB low:4680kB high:5616kB active:288kB inactive:252kB present:894080kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 265176 265176
HighMem free:59588kB min:512kB low:36080kB high:71652kB active:20256kB inactive:32541032kB present:33942528kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 2*4kB 4*8kB 4*16kB 4*32kB 5*64kB 1*128kB 3*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3496kB
Normal: 0*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1232kB
HighMem: 34*4kB 23*8kB 28*16kB 2*32kB 4*64kB 1*128kB 4*256kB 3*512kB 2*1024kB 5*2048kB 11*4096kB = 61120kB
Swap cache: add 27, delete 27, find 1/2, race 0+0
Free swap  = 16386168kB
Total swap = 16386260kB
Free swap:       16386168kB
8781824 pages of RAM
8552448 pages of HIGHMEM
463201 reserved pages
8140201 pages shared
0 pages swap cached
12 pages dirty
0 pages writeback
2382 pages mapped
136255 pages slab
167 pages pagetables
ib%d: failed to allocate receive buffer 144

----------

How to provoke #2 (with ib_ipoib loaded):

non-root$ dd if=/dev/zero of=/tmp/50G bs=1M count=50000

This results in an OOM triggering the OOM-killer which starts killing
processes.

----------

Any help would be appreciated, as well as confirmation of the same
sort of behavior.

Thanks,
John




More information about the general mailing list