[openib-general] OpenSM crash with today's trunk
Aniruddha Bohra
bohra at cs.rutgers.edu
Fri Oct 28 11:08:42 PDT 2005
Hal Rosenstock wrote:
>Or perhaps something crashed and didn't clean up properly. Does this occur immediately after a boot ?
>
>
>
After a fresh reboot of the machines on the switch, I get the log at
http://www.cs.rutgers.edu/~bohra/osm-v2.log
The opensm process does not crash but hangs. The state of the port never
changes.
Now there is an OOPS in the dmesg :
ct 28 13:52:13 hora-3 OpenSM[5168]: OpenSM Rev:openib-1.1.0
Oct 28 13:52:14 hora-3 kernel: Unable to handle kernel paging request at
virtual address 09000010
Oct 28 13:52:14 hora-3 kernel: printing eip:
Oct 28 13:52:14 hora-3 kernel: f883f12d
Oct 28 13:52:14 hora-3 kernel: *pde = 00000000
Oct 28 13:52:14 hora-3 kernel: Oops: 0000 [#1]
Oct 28 13:52:14 hora-3 kernel: SMP
Oct 28 13:52:14 hora-3 kernel: Modules linked in: ib_uverbs ib_umad ipv6
i2c_dev i2c_core sunrpc dm_mod video button battery ac uhci_hcd
hw_random ib_mthca ib_mad ib_core e1000 floppy
Oct 28 13:52:14 hora-3 kernel: CPU: 1
Oct 28 13:52:14 hora-3 kernel: EIP: 0060:[<f883f12d>] Not tainted VLI
Oct 28 13:52:14 hora-3 kernel: EFLAGS: 00010286 (2.6.13bohra)
Oct 28 13:52:14 hora-3 kernel: EIP is at ib_post_send_mad+0x1c/0x1b1
[ib_mad]
Oct 28 13:52:14 hora-3 kernel: eax: 09000000 ebx: c1a7d900 ecx:
c1a7d918 edx: 00000000
Oct 28 13:52:14 hora-3 kernel: esi: c1a7d918 edi: f6571f68 ebp:
f6571efc esp: f6571ed8
Oct 28 13:52:14 hora-3 kernel: ds: 007b es: 007b ss: 0068
Oct 28 13:52:14 hora-3 kernel: Process opensm (pid: 5224,
threadinfo=f6570000 task=f7dfb020)
Oct 28 13:52:14 hora-3 kernel: Stack: f883ef5a 00000000 c1a7d800
080bd018 f6571efc 00000000 f6a42900 a0f684f6
Oct 28 13:52:14 hora-3 kernel: f6571f68 f6571f74 f88f1728
00000000 00000018 000000e8 000000d0 f6a42948
Oct 28 13:52:14 hora-3 kernel: f68bda24 00000000 00000009
a0f684f6 00000009 c1a7d918 00000000 00000100
Oct 28 13:52:14 hora-3 kernel: Call Trace:
Oct 28 13:52:14 hora-3 kernel: [<c0104848>] show_stack+0x7c/0x92
Oct 28 13:52:14 hora-3 kernel: [<c01049c9>] show_registers+0x152/0x1ca
Oct 28 13:52:14 hora-3 kernel: [<c0104bcd>] die+0xf4/0x16f
Oct 28 13:52:14 hora-3 kernel: [<c011885c>] do_page_fault+0x463/0x649
Oct 28 13:52:14 hora-3 kernel: [<c01044bb>] error_code+0x4f/0x54
Oct 28 13:52:14 hora-3 kernel: [<f88f1728>] ib_umad_write+0x2d0/0x30e
[ib_umad]
Oct 28 13:52:14 hora-3 kernel: [<c015d69b>] vfs_write+0x155/0x15a
Oct 28 13:52:14 hora-3 kernel: [<c015d741>] sys_write+0x3d/0x64
Oct 28 13:52:14 hora-3 kernel: [<c01038d3>] sysenter_past_esp+0x54/0x75
Oct 28 13:52:14 hora-3 kernel: Code: e8 d8 63 af c7 89 d8 83 c4 0c 5b 5e
5f 5d c3 55 89 e5 57 56 89 c6 53 83 ec 18 85 f6 89 55 f0 0f 84 ff 00 00
00 8b 46 08 8d 5e e8 <8b> 50 10 8b 7b 14 85 d2 0f 84 7c 01 00 00 8b 4e
18 85 c9 74 0b
Thanks
Aniruddha
>________________________________
>
>From: openib-general-bounces at openib.org on behalf of Sean Hefty
>Sent: Fri 10/28/2005 12:01 PM
>To: Aniruddha Bohra
>Cc: openib-general at openib.org
>Subject: Re: [openib-general] OpenSM crash with today's trunk
>
>
>
>Aniruddha Bohra wrote:
>
>
>>>Oh well, I guess this is a different bug. Is there an oops or
>>>anything in your kernel log, or is this just a userspace crash?
>>>
>>>
>>>
>>This is what I see :
>>Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
>>Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
>>Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM
>>
>>Is this useful?
>>
>>
>
>Is there any chance opensm is already running on the system? It sounds like
>something has already registered to receive the same MADs that opensm wants to
>receive.
>
>- Sean
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>
>
>
More information about the general
mailing list