[openib-general] kernel oops

Sean Hefty mshefty at ichips.intel.com
Mon Aug 29 15:24:14 PDT 2005


Viswanath Krishnamurthy wrote:
> Call Trace:
>  [<c013e410>] __alloc_pages+0x166/0x3b6
>  [<c0267637>] ib_get_client_data+0x14/0x54
>  [<c027390f>] ib_sa_path_rec_get+0x1b/0x13e
>  [<c027952f>] resolve_path+0x8c/0x15b
>  [<c0278ff2>] path_req_complete+0x0/0xf7
>  [<c02a9932>] rtnetlink_dump_all+0x0/0x9e
>  [<c02a9a6d>] rtnetlink_done+0x0/0x3
>  [<c02799d3>] ib_at_paths_by_route+0xc4/0xd9
>  [<c0278aed>] same_path_req+0x0/0x95
>  [<c027a53d>] ib_uat_paths_by_route+0xef/0x1c4
>  [<c02a9932>] rtnetlink_dump_all+0x0/0x9e
>  [<c02a9a6d>] rtnetlink_done+0x0/0x3
>  [<c027ac87>] ib_uat_write+0x96/0xa2
>  [<c01567fe>] vfs_write+0x108/0x10a
>  [<c01568ab>] sys_write+0x41/0x6a
>  [<c01035eb>] sysenter_past_esp+0x54/0x75

Hal, I've looked into this more, and this is what appears to be 
happening.  Ucmpost calls ib_at_route_by_ip(), followed by 
ib_at_paths_by_route().  The first call fails asynchronously, which is 
ignored by ucmpost.  It expects that the call to ib_at_paths_by_route() 
to fail synchronously with invalid input.

The AT code in the kernel assumes that the ib_route passed into 
ib_at_paths_by_route is valid and dereferences a device pointer, which I 
think is causing this crash.  Can you confirm that this is what the code 
is doing?

The AT code appears to passing a kernel pointer up to the userspace app, 
and then requires that pointer to be passed back to the kernel.  This 
Needs to be changed to pass up some identifier that can be validated on 
the return to the kernel.

- Sean



More information about the general mailing list