[openib-general] Re: ib_uat kernel oops

Hal Rosenstock halr at voltaire.com
Thu Jul 28 06:38:43 PDT 2005


Hi Arlin,

On Wed, 2005-07-27 at 20:39, Arlin Davis wrote:
> Hi Hal,
> 
> I move from a 2.6.11 kernel to 2.6.12.3 and am now having some problems 
> with IBAT. (svn 2919)
> 
> Running the simple example...
> 
> [ardavis at iclust-19 examples]$ ./uatt
> uatt: main: uat test start
> uatt: main: ib_at_route_by_ip: ret 0 errno 0 for request 1 id 1 1
> 
> [ardavis at iclust-19 examples]$
> Message from syslogd at iclust-19 at Wed Jul 27 17:36:41 2005 ...
> iclust-19 kernel: Oops: 0002 [1] SMP
> 
> Message from syslogd at iclust-19 at Wed Jul 27 17:36:41 2005 ...
> iclust-19 kernel: CR2: 0000000000000000
> 
> Jul 27 17:36:41 iclust-19 kernel: Unable to handle kernel NULL pointer 
> dereference at 0000000000000000 RIP:
> Jul 27 17:36:41 iclust-19 kernel: 
> <ffffffff88057310>{:ib_uat:ib_uat_callback+155}
> Jul 27 17:36:41 iclust-19 kernel: PGD 34a77067 PUD 34a65067 PMD 0
> Jul 27 17:36:41 iclust-19 kernel: Oops: 0002 [1] SMP
> Jul 27 17:36:41 iclust-19 kernel: CPU 1
> Jul 27 17:36:41 iclust-19 kernel: Modules linked in: ib_att ib_uat ib_at 
> ib_ucm ib_cm ib_umad ib_uverbs ib_ipoib  ib_sa ib_mthca ib_mad ib_core
> Jul 27 17:36:41 iclust-19 kernel: Pid: 2248, comm: ib_at_wq/1 Not 
> tainted 2.6.12.3
> Jul 27 17:36:41 iclust-19 kernel: RIP: 0010:[<ffffffff88057310>] 
> <ffffffff88057310>{:ib_uat:ib_uat_callback+155}
> Jul 27 17:36:41 iclust-19 kernel: RSP: 0018:ffff8100362ade38  EFLAGS: 
> 00010282
> Jul 27 17:36:41 iclust-19 kernel: RAX: ffff8100386384b0 RBX: 
> ffff81003e8a5e00 RCX: 0000000000000000
> Jul 27 17:36:41 iclust-19 kernel: RDX: ffff81003e8a5e10 RSI: 
> 0000000000000001 RDI: ffff81003e8a5e00
> Jul 27 17:36:41 iclust-19 kernel: RBP: ffff810038638480 R08: 
> 0000000000000000 R09: 0000000000000000
> Jul 27 17:36:41 iclust-19 kernel: R10: ffff81003e8a5e00 R11: 
> 0000000000000050 R12: 0000000000000001
> Jul 27 17:36:41 iclust-19 kernel: R13: 0000000000000292 R14: 
> ffff81002da5de50 R15: ffffffff880523ed
> Jul 27 17:36:41 iclust-19 kernel: FS:  0000000000000000(0000) 
> GS:ffffffff8058e580(0000) knlGS:0000000000000000
> Jul 27 17:36:41 iclust-19 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
> 000000008005003b
> Jul 27 17:36:41 iclust-19 kernel: CR2: 0000000000000000 CR3: 
> 0000000034a3b000 CR4: 00000000000006e0
> Jul 27 17:36:41 iclust-19 kernel: Process ib_at_wq/1 (pid: 2248, 
> threadinfo ffff8100362ac000, task ffff81003ea6e f50)
> Jul 27 17:36:41 iclust-19 kernel: Stack: ffff81002da5de50 
> ffff81002da5de78 ffff810038501880 ffffffff88052408
> Jul 27 17:36:41 iclust-19 kernel:        ffff81002da5de70 
> ffffffff8014363e ffff8100385018c0 ffff810038501898
> Jul 27 17:36:41 iclust-19 kernel:        ffff8100385018a8 ffffffffffffffff
> Jul 27 17:36:41 iclust-19 kernel: Call 
> Trace:<ffffffff88052408>{:ib_at:req_comp_work+27} 
> <ffffffff8014363e>{work er_thread+501}
> Jul 27 17:36:41 iclust-19 kernel:        
> <ffffffff8013019d>{default_wake_function+0} <ffffffff8013019d>{default_ 
> wake_function+0}
> Jul 27 17:36:41 iclust-19 kernel:        
> <ffffffff80147520>{keventd_create_kthread+0} <ffffffff80143449>{worker_ 
> thread+0}
> Jul 27 17:36:41 iclust-19 kernel:        
> <ffffffff80147520>{keventd_create_kthread+0} <ffffffff801474f3>{kthread 
> +204}
> Jul 27 17:36:41 iclust-19 kernel:        <ffffffff8010f1a3>{child_rip+8} 
> <ffffffff80147520>{keventd_create_kthre ad+0}
> Jul 27 17:36:41 iclust-19 kernel:        <ffffffff80147427>{kthread+0} 
> <ffffffff8010f19b>{child_rip+0}
> Jul 27 17:36:41 iclust-19 kernel:
> Jul 27 17:36:41 iclust-19 kernel:
> Jul 27 17:36:41 iclust-19 kernel: Code: 48 89 11 48 8b 48 08 48 8d 53 20 
> 48 89 43 20 48 89 50 08 48
> Jul 27 17:36:41 iclust-19 kernel: RIP 
> <ffffffff88057310>{:ib_uat:ib_uat_callback+155} RSP <ffff8100362ade38>
> Jul 27 17:36:41 iclust-19 kernel: CR2: 0000000000000000
> .
> 
> any ideas?

The only thing I see is that if ib_uat_callback were invoked with a NULL
context this would fail as shown above.

Unfortunately, I don't see this. Is this reproducible ? The differences
in your setup v. mine are 2.6.12.3 v. 2.6.12.2, SMP, and perhaps
processor architecture.

Also, note that user AT is now on the trunk (and subsequent work will be
done here) so the shaharf-ibat branch no longer needs to be tracked.

-- Hal




More information about the general mailing list