[openib-general] OpenSM crash with today's trunk

Aniruddha Bohra bohra at cs.rutgers.edu
Fri Oct 28 11:08:42 PDT 2005


Hal Rosenstock wrote:

>Or perhaps something crashed and didn't clean up properly. Does this occur immediately after a boot ?
>
>  
>

After a fresh reboot of the machines on the switch, I get the log at
http://www.cs.rutgers.edu/~bohra/osm-v2.log

The opensm process does not crash but hangs. The state of the port never 
changes.

Now there is an OOPS in the dmesg :

ct 28 13:52:13 hora-3 OpenSM[5168]: OpenSM Rev:openib-1.1.0
Oct 28 13:52:14 hora-3 kernel: Unable to handle kernel paging request at 
virtual address 09000010
Oct 28 13:52:14 hora-3 kernel:  printing eip:
Oct 28 13:52:14 hora-3 kernel: f883f12d
Oct 28 13:52:14 hora-3 kernel: *pde = 00000000
Oct 28 13:52:14 hora-3 kernel: Oops: 0000 [#1]
Oct 28 13:52:14 hora-3 kernel: SMP
Oct 28 13:52:14 hora-3 kernel: Modules linked in: ib_uverbs ib_umad ipv6 
i2c_dev i2c_core sunrpc dm_mod video button battery ac uhci_hcd 
hw_random ib_mthca ib_mad ib_core e1000 floppy
Oct 28 13:52:14 hora-3 kernel: CPU:    1
Oct 28 13:52:14 hora-3 kernel: EIP:    0060:[<f883f12d>]    Not tainted VLI
Oct 28 13:52:14 hora-3 kernel: EFLAGS: 00010286   (2.6.13bohra)
Oct 28 13:52:14 hora-3 kernel: EIP is at ib_post_send_mad+0x1c/0x1b1 
[ib_mad]
Oct 28 13:52:14 hora-3 kernel: eax: 09000000   ebx: c1a7d900   ecx: 
c1a7d918   edx: 00000000
Oct 28 13:52:14 hora-3 kernel: esi: c1a7d918   edi: f6571f68   ebp: 
f6571efc   esp: f6571ed8
Oct 28 13:52:14 hora-3 kernel: ds: 007b   es: 007b   ss: 0068
Oct 28 13:52:14 hora-3 kernel: Process opensm (pid: 5224, 
threadinfo=f6570000 task=f7dfb020)
Oct 28 13:52:14 hora-3 kernel: Stack: f883ef5a 00000000 c1a7d800 
080bd018 f6571efc 00000000 f6a42900 a0f684f6
Oct 28 13:52:14 hora-3 kernel:        f6571f68 f6571f74 f88f1728 
00000000 00000018 000000e8 000000d0 f6a42948
Oct 28 13:52:14 hora-3 kernel:        f68bda24 00000000 00000009 
a0f684f6 00000009 c1a7d918 00000000 00000100
Oct 28 13:52:14 hora-3 kernel: Call Trace:
Oct 28 13:52:14 hora-3 kernel:  [<c0104848>] show_stack+0x7c/0x92
Oct 28 13:52:14 hora-3 kernel:  [<c01049c9>] show_registers+0x152/0x1ca
Oct 28 13:52:14 hora-3 kernel:  [<c0104bcd>] die+0xf4/0x16f
Oct 28 13:52:14 hora-3 kernel:  [<c011885c>] do_page_fault+0x463/0x649
Oct 28 13:52:14 hora-3 kernel:  [<c01044bb>] error_code+0x4f/0x54
Oct 28 13:52:14 hora-3 kernel:  [<f88f1728>] ib_umad_write+0x2d0/0x30e 
[ib_umad]
Oct 28 13:52:14 hora-3 kernel:  [<c015d69b>] vfs_write+0x155/0x15a
Oct 28 13:52:14 hora-3 kernel:  [<c015d741>] sys_write+0x3d/0x64
Oct 28 13:52:14 hora-3 kernel:  [<c01038d3>] sysenter_past_esp+0x54/0x75
Oct 28 13:52:14 hora-3 kernel: Code: e8 d8 63 af c7 89 d8 83 c4 0c 5b 5e 
5f 5d c3 55 89 e5 57 56 89 c6 53 83 ec 18 85 f6 89 55 f0 0f 84 ff 00 00 
00 8b 46 08 8d 5e e8 <8b> 50 10 8b 7b 14 85 d2 0f 84 7c 01 00 00 8b 4e 
18 85 c9 74 0b


Thanks
Aniruddha

>________________________________
>
>From: openib-general-bounces at openib.org on behalf of Sean Hefty
>Sent: Fri 10/28/2005 12:01 PM
>To: Aniruddha Bohra
>Cc: openib-general at openib.org
>Subject: Re: [openib-general] OpenSM crash with today's trunk
>
>
>
>Aniruddha Bohra wrote:
>  
>
>>>Oh well, I guess this is a different bug.  Is there an oops or
>>>anything in your kernel log, or is this just a userspace crash?
>>>
>>>      
>>>
>>This is what I see :
>>Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
>>Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
>>Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM
>>
>>Is this useful?
>>    
>>
>
>Is there any chance opensm is already running on the system?  It sounds like
>something has already registered to receive the same MADs that opensm wants to
>receive.
>
>- Sean
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>
>  
>




More information about the general mailing list