[openib-general] kernel oops

Fri Aug 26 12:02:59 PDT 2005

Still see the issue

1. I rebooted both the machines,  started opensm, after LID assignment 
killed opensm.
Next started the ucmpost client/server, killing it panics the system

-Viswa

Unable to handle kernel NULL pointer dereference at virtual address 00000068
 printing eip:
c02f2635
*pde = 3661e001
Oops: 0000 [#1]
SMP
Modules linked in: nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd 
hw_random e1000 ext3 jbd sd_mod
CPU:    0
EIP:    0060:[<c02f2635>]    Not tainted VLI
EFLAGS: 00010086   (2.6.12.5)
EIP is at _spin_lock_irqsave+0xa/0x51
eax: 00000064   ebx: 00000286   ecx: f689be6c   edx: c036cbcc
esi: 00000064   edi: 00000064   ebp: 00000000   esp: f689be00
ds: 007b   es: 007b   ss: 0068
Process lt-ucmpost (pid: 3993, threadinfo=f689a000 task=f6ef9540)
Stack: 00000000 c013e3f0 00000000 c036cbcc c0267667 00000000 000000d0 
f689beac
       f66a9e80 c027393f c0350d00 00000000 f689be6c 0c300000 00000064 
f689beac
       f66a9e80 c027955f 00000000 0c300000 00000064 000000d0 c0279022 
f66a9e80
Call Trace:
 [<c013e3f0>] __alloc_pages+0x166/0x3b6
 [<c0267667>] ib_get_client_data+0x14/0x54
 [<c027393f>] ib_sa_path_rec_get+0x1b/0x13e
 [<c027955f>] resolve_path+0x8c/0x15b
 [<c0279022>] path_req_complete+0x0/0xf7
 [<c02a89a4>] rtnetlink_dump_all+0x0/0x9e
 [<c02a8adf>] rtnetlink_done+0x0/0x3
 [<c0279a03>] ib_at_paths_by_route+0xc4/0xd9
 [<c0278b1d>] same_path_req+0x0/0x95           

Sean Hefty wrote:

>>I downloaded the latest openib gen2 stack and ran into kernel panic when
>>I run the cmpost/ucmpost example. I modified the program to continously
>>send and receive data in an infinite loop and killed the application
>>with ctrl-c.
>>The kernel panics pretty consistently.
>>
>>I am currently running 2.6.12 version of the kernel .  Log attached.  I
>>will try
>>upgrading to newer kernel and see if I can reproduce it.
>>    
>>
>
>I have gotten something similar to this in my own testing, but haven't had the
>time to track it down.  It seems to be related to how the IB AT code interacts
>with the SM, and if the SM has been restarted.  Can you try resetting the SM
>node, then rebooting your other systems?
>
>- Sean
>
>  
>