[openib-general] kernel oops
Viswanath Krishnamurthy
vkrishnamurthy at xsigo.com
Fri Aug 26 12:02:59 PDT 2005
Still see the issue
1. I rebooted both the machines, started opensm, after LID assignment
killed opensm.
Next started the ucmpost client/server, killing it panics the system
-Viswa
Unable to handle kernel NULL pointer dereference at virtual address 00000068
printing eip:
c02f2635
*pde = 3661e001
Oops: 0000 [#1]
SMP
Modules linked in: nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd
hw_random e1000 ext3 jbd sd_mod
CPU: 0
EIP: 0060:[<c02f2635>] Not tainted VLI
EFLAGS: 00010086 (2.6.12.5)
EIP is at _spin_lock_irqsave+0xa/0x51
eax: 00000064 ebx: 00000286 ecx: f689be6c edx: c036cbcc
esi: 00000064 edi: 00000064 ebp: 00000000 esp: f689be00
ds: 007b es: 007b ss: 0068
Process lt-ucmpost (pid: 3993, threadinfo=f689a000 task=f6ef9540)
Stack: 00000000 c013e3f0 00000000 c036cbcc c0267667 00000000 000000d0
f689beac
f66a9e80 c027393f c0350d00 00000000 f689be6c 0c300000 00000064
f689beac
f66a9e80 c027955f 00000000 0c300000 00000064 000000d0 c0279022
f66a9e80
Call Trace:
[<c013e3f0>] __alloc_pages+0x166/0x3b6
[<c0267667>] ib_get_client_data+0x14/0x54
[<c027393f>] ib_sa_path_rec_get+0x1b/0x13e
[<c027955f>] resolve_path+0x8c/0x15b
[<c0279022>] path_req_complete+0x0/0xf7
[<c02a89a4>] rtnetlink_dump_all+0x0/0x9e
[<c02a8adf>] rtnetlink_done+0x0/0x3
[<c0279a03>] ib_at_paths_by_route+0xc4/0xd9
[<c0278b1d>] same_path_req+0x0/0x95
Sean Hefty wrote:
>>I downloaded the latest openib gen2 stack and ran into kernel panic when
>>I run the cmpost/ucmpost example. I modified the program to continously
>>send and receive data in an infinite loop and killed the application
>>with ctrl-c.
>>The kernel panics pretty consistently.
>>
>>I am currently running 2.6.12 version of the kernel . Log attached. I
>>will try
>>upgrading to newer kernel and see if I can reproduce it.
>>
>>
>
>I have gotten something similar to this in my own testing, but haven't had the
>time to track it down. It seems to be related to how the IB AT code interacts
>with the SM, and if the SM has been restarted. Can you try resetting the SM
>node, then rebooting your other systems?
>
>- Sean
>
>
>
More information about the general
mailing list