[openib-general] kernel oops

Hal Rosenstock halr at voltaire.com
Tue Aug 30 06:11:06 PDT 2005


Hi Sean,

On Mon, 2005-08-29 at 18:24, Sean Hefty wrote:
> Viswanath Krishnamurthy wrote:
> > Call Trace:
> >  [<c013e410>] __alloc_pages+0x166/0x3b6
> >  [<c0267637>] ib_get_client_data+0x14/0x54
> >  [<c027390f>] ib_sa_path_rec_get+0x1b/0x13e
> >  [<c027952f>] resolve_path+0x8c/0x15b
> >  [<c0278ff2>] path_req_complete+0x0/0xf7
> >  [<c02a9932>] rtnetlink_dump_all+0x0/0x9e
> >  [<c02a9a6d>] rtnetlink_done+0x0/0x3
> >  [<c02799d3>] ib_at_paths_by_route+0xc4/0xd9
> >  [<c0278aed>] same_path_req+0x0/0x95
> >  [<c027a53d>] ib_uat_paths_by_route+0xef/0x1c4
> >  [<c02a9932>] rtnetlink_dump_all+0x0/0x9e
> >  [<c02a9a6d>] rtnetlink_done+0x0/0x3
> >  [<c027ac87>] ib_uat_write+0x96/0xa2
> >  [<c01567fe>] vfs_write+0x108/0x10a
> >  [<c01568ab>] sys_write+0x41/0x6a
> >  [<c01035eb>] sysenter_past_esp+0x54/0x75
> 
> Hal, I've looked into this more, and this is what appears to be 
> happening.  

Thanks for looking into this. It's been on my list but I hadn't quite
got to it yet.

> Ucmpost calls ib_at_route_by_ip(), followed by 
> ib_at_paths_by_route().  The first call fails asynchronously, which is 
> ignored by ucmpost.  It expects that the call to ib_at_paths_by_route() 
> to fail synchronously with invalid input.

Why would ib_at_paths_by_route be called if no route were obtained (from
ib_at_route_by_ip) ? Isn't that a ucmpost issue ? (I also agree it's not
good for UAT to crash).

> The AT code in the kernel assumes that the ib_route passed into 
> ib_at_paths_by_route is valid and dereferences a device pointer, which I 
> think is causing this crash.  Can you confirm that this is what the code 
> is doing?

It needs to be a valid route struct. I'm not sure how the kernel can
validate that is the case. It does check for NULL pointer but this is
bad pointer.

> The AT code appears to passing a kernel pointer up to the userspace app, 
> and then requires that pointer to be passed back to the kernel.  This 
> Needs to be changed to pass up some identifier that can be validated on 
> the return to the kernel.

Isn't it copying the ib_route structure to userspace ?

-- Hal




More information about the general mailing list