[openib-general] progress...but opensm crashed

shaharf shaharf at voltaire.com
Mon Dec 27 02:55:10 PST 2004


Tom,
	I debugged the opensm a little, and it seems like a simple
opensm bug. Somehow the destination lid of the report target was not
found, and as there were no checks for that, the opensm faulted. I added
the missing checks. I don't know how this happened - i.e. why was the
dest port invalid. It may happen during reconfiguration phases. Again,
log files would help me very much.

Please update your opensm and try again.

Shahar

> -----Original Message-----
> From: openib-general-bounces at openib.org [mailto:openib-general-
> bounces at openib.org] On Behalf Of Tom Duffy
> Sent: Friday, December 24, 2004 1:17 AM
> To: openib-general at openib.org
> Subject: [openib-general] progress...but opensm crashed
> 
> The good news was that my port went to active on the node running
opensm
> (way to go Shahar!).
> 
> The bad news is that there was no xmas miracle when I brought up
another
> node on the subnet.
> 
> (gdb) bt
> #0  0x0000002a95994f1e in stack_dump () at stack.c:33
> #1  0x0000002a959953e1 in handler (x=11) at stack.c:112
> #2  <signal handler called>
> #3  0x0000000000410121 in osm_port_share_pkey (p_log=0x559af8,
>     p_port_1=0x5913c0, p_port_2=0x0) at osm_port.h:1616
> #4  0x0000000000408a5c in __match_notice_to_inf_rec
(p_list_item=0x559b00,
>     context=0x0) at osm_inform.c:599
> #5  0x0000002a9588b044 in cl_qlist_apply_func (p_list=0x557ce0,
>     pfn_func=0x4086d9 <__match_notice_to_inf_rec>, context=0x43004f90)
>     at cl_list.c:387
> #6  0x0000000000408c8a in osm_report_notice (p_log=0x559af8,
> p_subn=0x557940,
>     p_ntc=0x430050d0) at osm_inform.c:705
> #7  0x000000000042bfa2 in __osm_trap_rcv_process_request
(p_rcv=0x558848,
>     p_madw=0x5b95a8) at osm_trap_rcv.c:681
> #8  0x000000000042c128 in osm_trap_rcv_process (p_rcv=0x558848,
>     p_madw=0x5b95a8) at osm_trap_rcv.c:759
> #9  0x000000000042c158 in __osm_trap_rcv_ctrl_disp_callback
> (context=0x559b00,
>     p_data=0x0) at osm_trap_rcv_ctrl.c:99
> #10 0x00000000004035ce in __cl_disp_worker (context=0x559b00)
>     at cl_dispatcher.c:138
> #11 0x0000002a9588f5ef in __cl_thread_pool_routine (context=0x559b00)
>     at cl_threadpool.c:111
> #12 0x0000002a9588f4be in __cl_thread_wrapper (arg=0x0) at
cl_thread.c:94
> ---Type <return> to continue, or q <return> to quit---
> #13 0x0000002a9567213a in start_thread () from
/lib64/tls/libpthread.so.0
> #14 0x0000002a95ce33c3 in clone () from /lib64/tls/libc.so.6
> #15 0x0000000000000000 in ?? ()
> 
> --
> Tom Duffy <tduffy at sun.com>



More information about the general mailing list