[openib-general] A couple of kdapltest oopses

Hal Rosenstock halr at voltaire.com
Fri Jun 10 09:07:23 PDT 2005


On Fri, 2005-06-10 at 11:29, Itamar Rabenstein wrote: 
> is this new ? did you saw it before ?

I pretty sure I did.

> do you see the problem with -i 1000 ?

Yes, but it takes more repeats of the command to make it occur.

> both oops are in the malloc stage.
> I have seen in the past problems were dapl tried to malloc a very big
> memory.

Or is it some malloc'd memory is not being returned and eventually runs
out ? Shouldn't the failure be more graceful too ?

-- Hal

>  Itamar
> 
> 
> > --Original Message--
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Friday, June 10, 2005 4:32 PM
> > To: James Lentini
> > Cc: openib-general at openib.org
> > Subject: [openib-general] A couple of kdapltest oopses
> > 
> > 
> > Hi,
> > 
> > First on the client when running transaction test, I see the 
> > following:
> > kdapltest -T T -s <server IP addr> -D mthca0a -d -i 10000 -w 
> > 8 client SR server SR
> > 
> > Jun 10 08:58:17 localhost kernel: DT_Mdep_Thread_: page 
> > allocation failure. order:0, mode:0x20
> > Jun 10 08:58:17 localhost kernel:  [<c01470f2>] 
> > __alloc_pages+0x2b2/0x440
> > Jun 10 08:58:17 localhost kernel:  [<c0111ce8>] 
> > kernel_map_pages+0x28/0x70
> > Jun 10 08:58:17 localhost kernel:  [<c014b2e1>] 
> > kmem_getpages+0x31/0xb0
> > Jun 10 08:58:17 localhost kernel:  [<c014cd59>] cache_grow+0x139/0x360
> > Jun 10 08:58:17 localhost kernel:  [<c014d523>] 
> > cache_alloc_refill+0x153/0x340
> > Jun 10 08:58:17 localhost kernel:  [<c014ad65>] dbg_redzone1+0x15/0x30
> > Jun 10 08:58:17 localhost kernel:  [<c014d77e>] 
> > cache_alloc_debugcheck_after+0x6e/0x1a0
> > Jun 10 08:58:17 localhost kernel:  [<c014de11>] __kmalloc+0xb1/0xe0
> > Jun 10 08:58:17 localhost kernel:  [<d0a33995>] 
> > DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a34826>] 
> > DT_Tdep_PT_Printf+0x16/0x1b0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a27bdf>] 
> > DT_Transaction_Run+0x48f/0xb60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a33b4d>] 
> > DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a26933>] 
> > DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<c0111ce8>] 
> > kernel_map_pages+0x28/0x70
> > Jun 10 08:58:17 localhost kernel:  [<c014d196>] 
> > cache_free_debugcheck+0x196/0x2d0
> > Jun 10 08:58:17 localhost kernel:  [<d0a33a4f>] 
> > DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a33a30>] 
> > DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<c0100875>] 
> > kernel_thread_helper+0x5/0x10
> > Jun 10 08:58:17 localhost kernel: Unable to handle kernel 
> > NULL pointer dereference at virtual address 00000004
> > Jun 10 08:58:17 localhost kernel:  printing eip:
> > Jun 10 08:58:17 localhost kernel: c022048b
> > Jun 10 08:58:17 localhost kernel: *pde = 07cb6067
> > Jun 10 08:58:17 localhost kernel: *pte = 00000000
> > Jun 10 08:58:17 localhost kernel: Oops: 0002 [#1]
> > Jun 10 08:58:17 localhost kernel: DEBUG_PAGEALLOC
> > Jun 10 08:58:17 localhost kernel: Modules linked in: 
> > kdapltest ib_dat_provider ib_cm ib_at dat ib_ipoib ib_sa 
> > ib_umad ide_cd cdrom lp ipv6 autofs parport_pc parport 
> > uhci_hcd ehci_hcd ib_mthca ib_mad ib_core ohci_hcd eepro100 
> > mii evdev usbcore
> > Jun 10 08:58:17 localhost kernel: CPU:    0
> > Jun 10 08:58:17 localhost kernel: EIP:    0060:[<c022048b>]   
> >  Not tainted VLI
> > Jun 10 08:58:17 localhost kernel: EFLAGS: 00010283   (2.6.11.6) 
> > Jun 10 08:58:17 localhost kernel: EIP is at vsnprintf+0x4b/0x4c0
> > Jun 10 08:58:17 localhost kernel: eax: 00000054   ebx: 
> > c1d17f78   ecx: 00000000   edx: d0a35eef
> > Jun 10 08:58:17 localhost kernel: esi: 00000004   edi: 
> > c1d17f78   ebp: 00000103   esp: ce5bfd84
> > Jun 10 08:58:17 localhost kernel: ds: 007b   es: 007b   ss: 0068
> > Jun 10 08:58:17 localhost kernel: Process DT_Mdep_Thread_ 
> > (pid: 6696, threadinfo=ce5be000 task=c3730a90)
> > Jun 10 08:58:17 localhost kernel: Stack: cffff740 00000020 
> > 00000000 d0a33995 00000104 00000000 c1d17f78 d0a33995 
> > Jun 10 08:58:17 localhost kernel:        00000104 00000020 
> > c1d17f78 00000000 c1d17f78 c5b58000 d0a3484b 00000004 
> > Jun 10 08:58:17 localhost kernel:        00000100 d0a35eef 
> > ce5bfdf4 0000007b ffffff05 c1d17f78 cf34ef78 c60d2060 
> > Jun 10 08:58:17 localhost kernel: Call Trace:
> > Jun 10 08:58:17 localhost kernel:  [<d0a33995>] 
> > DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a33995>] 
> > DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a3484b>] 
> > DT_Tdep_PT_Printf+0x3b/0x1b0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a27bdf>] 
> > DT_Transaction_Run+0x48f/0xb60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a33b4d>] 
> > DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a26933>] 
> > DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<c0111ce8>] 
> > kernel_map_pages+0x28/0x70
> > Jun 10 08:58:17 localhost kernel:  [<c014d196>] 
> > cache_free_debugcheck+0x196/0x2d0
> > Jun 10 08:58:17 localhost kernel:  [<d0a33a4f>] 
> > DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<d0a33a30>] 
> > DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [<c0100875>] 
> > kernel_thread_helper+0x5/0x10
> > Jun 10 08:58:17 localhost kernel: Code: f0 48 39 c5 73 0d 89 
> > f2 f7 da bd ff ff ff ff 89 54 24 40 8b 54 24 44 80 3a 00 74 
> > 23 8d 74 26 00 0f b6 02 3c 25 74 3d 39 ee 77 06 <88> 06 8b 54 
> > 24 44 46 89 d0 42 89 54 24 44 80 78 01 00 75 e1 39 
> > 
> > then on the server side, when I try to rmmod kdapltest, I get:
> > 
> > Unable to handle kernel paging request at ffffffff88243a05 RIP: 
> > [<ffffffff88243a05>]
> > PGD 103027 PUD 105027 PMD 3baeb067 PTE 0
> > Oops: 0010 [1] SMP 
> > CPU 1 
> > Modules linked in: ib_dat_provider ib_cm ib_at dat ib_ipoib 
> > ib_sa parport_pc lp parport autofs4 sunrpc ipt_REJECT 
> > ipt_state ip_conntrack iptable_filter ip_tables video button 
> > battery ac md5 ipv6 ohci_hcd i2c_amd8111 i2c_core hw_random 
> > ib_mthca ib_mad ib_core e100 mii tg3 floppy dm_snapshot 
> > dm_zero dm_mirror ext3 jbd dm_mod sata_sil libata sd_mod scsi_mod
> > Pid: 9094, comm: kdapltest Not tainted 2.6.11.6
> > RIP: 0010:[<ffffffff88243a05>] [<ffffffff88243a05>]
> > RSP: 0018:ffff810037787e58  EFLAGS: 00010292
> > RAX: 0000000000000061 RBX: ffff810035af6d88 RCX: 0000000000000282
> > RDX: ffff810020712308 RSI: 0000000000000282 RDI: ffff810020711ce0
> > RBP: ffff810035af6d80 R08: ffff810037786000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: ffff810038daccc0
> > R13: 0000000000000001 R14: 0000000000000000 R15: ffff8100234ad008
> > FS:  00002aaaaaaccec0(0000) GS:ffffffff804c9f00(0000) 
> > knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffffffff88243a05 CR3: 000000003a33a000 CR4: 00000000000006e0
> > Process kdapltest (pid: 9094, threadinfo ffff810037786000, 
> > task ffff81003cd16860)
> > Stack: ffff810000002000 0000024000000003 ffff810000000100 
> > ffff810000000000 
> >        ffff810000000000 0000000000000000 ffff81003d40f4c0 
> > 0000000100000000 
> >        ffff81001eb44000 ffff8100234ad000 
> > Call Trace:<ffffffff8010f1c7>{child_rip+8} 
> > <ffffffff8010f1bf>{child_rip+0} 
> >        
> > 
> > Code:  Bad RIP value.
> > RIP [<ffffffff88243a05>] RSP <ffff810037787e58>
> > CR2: ffffffff88243a05
> >  <1>Unable to handle kernel paging request at ffffffff8825eac8 RIP: 
> > <ffffffff8017d419>{filp_close+73}
> > PGD 103027 PUD 105027 PMD 3baeb067 PTE 0
> > Oops: 0000 [2] SMP 
> > CPU 1 
> > Modules linked in: ib_dat_provider ib_cm ib_at dat ib_ipoib 
> > ib_sa parport_pc lp parport autofs4 sunrpc ipt_REJECT 
> > ipt_state ip_conntrack iptable_filter ip_tables video button 
> > battery ac md5 ipv6 ohci_hcd i2c_amd8111 i2c_core hw_random 
> > ib_mthca ib_mad ib_core e100 mii tg3 floppy dm_snapshot 
> > dm_zero dm_mirror ext3 jbd dm_mod sata_sil libata sd_mod scsi_mod
> > Pid: 9094, comm: kdapltest Not tainted 2.6.11.6
> > RIP: 0010:[<ffffffff8017d419>] <ffffffff8017d419>{filp_close+73}
> > RSP: 0018:ffff810037787bf8  EFLAGS: 00010286
> > RAX: ffffffff8825ea60 RBX: ffff81001f24a8c0 RCX: ffff810019109238
> > RDX: ffff810037e58e70 RSI: ffff810037e58d40 RDI: ffff81001f24a8c0
> > RBP: 0000000000000000 R08: ffff81003d7ce6c0 R09: ffff810037787bf8
> > R10: 0000000000000001 R11: 0000000000000000 R12: ffff810037e58d40
> > R13: 0000000000000001 R14: 0000000000000001 R15: ffff81001f278940
> > FS:  00002aaaaaaccec0(0000) GS:ffffffff804c9f00(0000) 
> > knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffffffff8825eac8 CR3: 000000003a33a000 CR4: 00000000000006e0
> > Process kdapltest (pid: 9094, threadinfo ffff810037786000, 
> > task ffff81003cd16860)
> > Stack: 0000000000000003 0000000000000003 ffff810037e58d40 
> > ffffffff80139723 
> >        0000000000000009 ffff810037e58d40 ffff81003cd16eb0 
> > ffff81003cd16860 
> >        0000000000000009 ffffffff80139eba 
> > Call Trace:<ffffffff80139723>{put_files_struct+115} 
> > <ffffffff80139eba>{do_exit+378} 
> >        <ffffffff802433d7>{do_unblank_screen+119} 
> > <ffffffff80121f0f>{do_page_fault+2047} 
> >        <ffffffff803357ea>{thread_return+42} 
> > <ffffffff8010f011>{error_exit+0} 
> >        <ffffffff8010f1c7>{child_rip+8} 
> > <ffffffff8010f1bf>{child_rip+0} 
> >        
> > 
> > Code: 48 8b 40 68 48 85 c0 74 0e 48 89 df ff d0 85 ed 0f 44 e8 66 
> > RIP <ffffffff8017d419>{filp_close+73} RSP <ffff810037787bf8>
> > CR2: ffffffff8825eac8
> >  <1>Unable to handle kernel paging request at ffffffff88245190 RIP: 
> > [<ffffffff88245190>]
> > PGD 103027 PUD 105027 PMD 3baeb067 PTE 0
> > Oops: 0010 [3] SMP 
> > CPU 0 
> > Modules linked in: ib_dat_provider ib_cm ib_at dat ib_ipoib 
> > ib_sa parport_pc lp parport autofs4 sunrpc ipt_REJECT 
> > ipt_state ip_conntrack iptable_filter ip_tables video button 
> > battery ac md5 ipv6 ohci_hcd i2c_amd8111 i2c_core hw_random 
> > ib_mthca ib_mad ib_core e100 mii tg3 floppy dm_snapshot 
> > dm_zero dm_mirror ext3 jbd dm_mod sata_sil libata sd_mod scsi_mod
> > Pid: 9119, comm: DT_Mdep_Thread_ Not tainted 2.6.11.6
> > RIP: 0010:[<ffffffff88245190>] [<ffffffff88245190>]
> > RSP: 0018:ffff810031907f18  EFLAGS: 00010296
> > RAX: 0000000000000000 RBX: ffff81001b770000 RCX: 0000000000000006
> > RDX: 0000000000000008 RSI: 0000000000000008 RDI: 0000000000000003
> > RBP: ffff81001b77004c R08: ffffffff80508a00 R09: 0000000000000008
> > R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
> > R13: ffff810038daccc0 R14: 0000000000000000 R15: ffff8100234ad008
> > FS:  00002aaaaaaccec0(0000) GS:ffffffff804c9e80(0000) 
> > knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffffffff88245190 CR3: 000000003d8b3000 CR4: 00000000000006e0
> > Process DT_Mdep_Thread_ (pid: 9119, threadinfo 
> > ffff810031906000, task ffff81003cd16170)
> > Stack: ffff810037bde1c0 ffff810035af6d80 ffff810038daccc0 
> > 0000000000000001 
> >        ffff81001b770000 ffffffff88250ff8 ffff810037bde1c0 
> > ffffffff8010f1c7 
> >        ffff8100234ad008 ffff81001b770000 
> > Call Trace:<ffffffff8010f1c7>{child_rip+8} 
> > <ffffffff8011cdd0>{flat_send_IPI_mask+0} 
> >        <ffffffff8010f1bf>{child_rip+0} 
> > 
> > Code:  Bad RIP value.
> > RIP [<ffffffff88245190>] RSP <ffff810031907f18>
> > CR2: ffffffff88245190
> > 
> > -- Hal
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 




More information about the general mailing list