[openib-general] Re: opensm: new segv on shutdown

Hal Rosenstock halr at voltaire.com
Fri Jun 3 09:14:45 PDT 2005


On Wed, 2005-06-01 at 16:51, Tom Duffy wrote: 
> I am putting together a network with a dumb IB switch, a couple of Linux
> OpenIB boxes, a Solaris 10 box, a Solaris Nevada box, etc.  I fired up
> opensm on one of the Linux nodes, tried to plumb Solaris, no luck.  I
> then hit control-c on opensm and it crashed.  Here is the messages and
> then crash.
> 
> OpenSM: Got signal 2 - exiting...
> 
> -- OpenSM Exiting (in 3 seconds) --
> There are still 44 mads out. Forcing the exit of the OpenSM application...
> Segmentation fault (core dumped)
> 
> #0  stack_dump () at src/stack.c:72
> 72              if (!__builtin_frame_address(2))
> #0  stack_dump () at src/stack.c:72
> #1  0x00002aaaaacbd1a6 in handler (x=11) at src/stack.c:151
> #2  <signal handler called>
> #3  __match_notice_to_inf_rec (p_list_item=0x585e50, context=0x10)
>     at osm_inform.c:413
                                                       ^^^^^^^^^^^^
Context looks weird here...

__match_notice_to_inf_rec(
  IN  cl_list_item_t* const p_list_item,
  IN  void*                       context )
{
  osm_infr_match_ctxt_t* p_infr_match = (osm_infr_match_ctxt_t *)context;
  ib_mad_notice_attr_t*  p_ntc = p_infr_match->p_ntc;
  cl_list_t*             p_infr_to_remove_list = p_infr_match->p_remove_infr_list;
  osm_infr_t* p_infr_rec = (osm_infr_t*)p_list_item;
  ib_inform_info_t  *p_ii = &(p_infr_rec->inform_record.inform_info);
  cl_status_t status = CL_NOT_FOUND;
  osm_log_t *p_log =  p_infr_rec->p_infr_rcv->p_log;    <==================== segv here

> #4  0x00002aaaaadc49ff in cl_qlist_apply_func (p_list=0x550680,
>     pfn_func=0x4085e0 <__match_notice_to_inf_rec>, context=0x43004f80)
>     at cl_list.c:354
> #5  0x0000000000408c57 in osm_report_notice (p_log=0x552498, p_subn=0x5502e0,
>     p_ntc=0x430050d0) at osm_inform.c:680
> #6  0x000000000042b472 in __osm_trap_rcv_process_request (p_rcv=0x5511e8,
>     p_madw=0x0) at osm_trap_rcv.c:646
> #7  0x000000000042b7fb in osm_trap_rcv_process (p_rcv=0x5511e8,
>     p_madw=0x566e60) at osm_trap_rcv.c:724
> #8  0x00000000004033bb in __cl_disp_worker (context=0x585e50)
>     at cl_dispatcher.c:105
> #9  0x00002aaaaadc928f in __cl_thread_pool_routine (context=0x552558)
>     at cl_threadpool.c:78
> #10 0x00002aaaaadc911e in __cl_thread_wrapper (arg=0x51) at cl_thread.c:61
> #11 0x00000036d28060aa in start_thread () from /lib64/tls/libpthread.so.0
> #12 0x00000036d19c53d3 in clone () from /lib64/tls/libc.so.6
> #13 0x0000000000000000 in ?? ()
> 
> Also, here are the last 500 lines of osm.log.
> 
> Jun 01 13:37:00 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:00 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:01 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:01 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:01 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x566178, size = 256.
> Jun 01 13:37:01 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03210, size = 256.
> Jun 01 13:37:01 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:01 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x566160, p_mad = 0x2aaaaaf03240, size = 256.
> Jun 01 13:37:01 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:01 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:01 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 99 QP1 MADs received.
> Jun 01 13:37:01 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000007
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:01 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:01 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:01 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:01 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:02 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:02 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:02 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x5660a8, size = 256.
> Jun 01 13:37:02 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03350, size = 256.
> Jun 01 13:37:02 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:02 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x566090, p_mad = 0x2aaaaaf03380, size = 256.
> Jun 01 13:37:02 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:02 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:02 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 100 QP1 MADs received.
> Jun 01 13:37:02 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000007
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:02 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:02 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:02 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:02 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:03 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:03 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:03 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565fd8, size = 256.
> Jun 01 13:37:03 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03490, size = 256.
> Jun 01 13:37:03 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:03 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565fc0, p_mad = 0x2aaaaaf034c0, size = 256.
> Jun 01 13:37:03 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:03 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:03 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 101 QP1 MADs received.
> Jun 01 13:37:03 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000007
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:03 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:03 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:03 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:03 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:04 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:04 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:04 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565f08, size = 256.
> Jun 01 13:37:04 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf035d0, size = 256.
> Jun 01 13:37:04 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:04 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565ef0, p_mad = 0x2aaaaaf03600, size = 256.
> Jun 01 13:37:04 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:04 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:04 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 102 QP1 MADs received.
> Jun 01 13:37:04 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000008
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:04 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:04 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:04 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:04 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:04 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:37:04 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:37:04 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:37:05 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:05 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:05 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565e38, size = 256.
> Jun 01 13:37:05 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03710, size = 256.
> Jun 01 13:37:05 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:05 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565e20, p_mad = 0x2aaaaaf03740, size = 256.
> Jun 01 13:37:05 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:05 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:05 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 103 QP1 MADs received.
> Jun 01 13:37:05 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000008
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:05 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:05 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:05 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:05 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:06 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:06 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:06 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565d68, size = 256.
> Jun 01 13:37:06 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03850, size = 256.
> Jun 01 13:37:06 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:06 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565d50, p_mad = 0x2aaaaaf03880, size = 256.
> Jun 01 13:37:06 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:06 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:06 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 104 QP1 MADs received.
> Jun 01 13:37:06 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000008
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:06 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:06 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:06 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:06 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:07 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:07 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:07 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565c98, size = 256.
> Jun 01 13:37:07 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03990, size = 256.
> Jun 01 13:37:07 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:07 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565c80, p_mad = 0x2aaaaaf039c0, size = 256.
> Jun 01 13:37:07 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:07 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:07 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 105 QP1 MADs received.
> Jun 01 13:37:07 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000009
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:07 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:07 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:07 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:07 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:09 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:09 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:09 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565bc8, size = 256.
> Jun 01 13:37:09 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03ad0, size = 256.
> Jun 01 13:37:09 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:09 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565bb0, p_mad = 0x2aaaaaf03b00, size = 256.
> Jun 01 13:37:09 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:09 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:09 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 106 QP1 MADs received.
> Jun 01 13:37:09 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000009
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:09 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:09 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:09 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:09 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:10 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:10 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:10 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565af8, size = 256.
> Jun 01 13:37:10 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03c10, size = 256.
> Jun 01 13:37:10 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:10 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565ae0, p_mad = 0x2aaaaaf03c40, size = 256.
> Jun 01 13:37:10 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:10 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:10 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 107 QP1 MADs received.
> Jun 01 13:37:10 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x976571200000009
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:10 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:10 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:10 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:10 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:11 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:11 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:11 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565a28, size = 256.
> Jun 01 13:37:11 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03d50, size = 256.
> Jun 01 13:37:11 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:11 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565a10, p_mad = 0x2aaaaaf03d80, size = 256.
> Jun 01 13:37:11 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:11 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:11 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 108 QP1 MADs received.
> Jun 01 13:37:11 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x97657120000000a
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:11 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:11 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:11 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:11 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:12 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:12 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:12 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565958, size = 256.
> Jun 01 13:37:12 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03e90, size = 256.
> Jun 01 13:37:12 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:12 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565940, p_mad = 0x2aaaaaf03ec0, size = 256.
> Jun 01 13:37:12 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:12 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:12 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 109 QP1 MADs received.
> Jun 01 13:37:12 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x97657120000000a
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:12 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:12 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:12 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:12 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:13 [44808960] -> osm_mad_pool_get: [
> Jun 01 13:37:13 [44808960] -> osm_vendor_get: [
> Jun 01 13:37:13 [44808960] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x565888, size = 256.
> Jun 01 13:37:13 [44808960] -> osm_vendor_get: Acquired UMAD 0x2aaaaaf03fd0, size = 256.
> Jun 01 13:37:13 [44808960] -> osm_vendor_get: ]
> Jun 01 13:37:13 [44808960] -> osm_mad_pool_get: Acquired p_madw = 0x565870, p_mad = 0x2aaaaaf04000, size = 256.
> Jun 01 13:37:13 [44808960] -> osm_mad_pool_get: ]
> Jun 01 13:37:13 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
> Jun 01 13:37:13 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 110 QP1 MADs received.
> Jun 01 13:37:13 [44808960] -> SA MAD dump:
> 				base_ver................0x1
> 				mgmt_class..............0x3
> 				class_ver...............0x2
> 				method..................0x12 (SubnAdmGetTable)
> 				status..................0x0
> 				resv....................0x0
> 				trans_id................0x97657120000000a
> 				attr_id.................0x38 (MCMemberRecord)
> 				resv1...................0x0
> 				attr_mod................0xFFFFFFFF
> 				rmpp_version............0x0
> 				rmpp_type...............0x0
> 				rmpp_flags..............0x0
> 				rmpp_status.............0x0
> 				seg_num.................0x0
> 				payload_len/new_win.....0x0
> 				sm_key..................0x0000000000000000
> 				attr_offset.............0x0
> 				resv2...................0x0
> 				comp_mask...............0x0000000000008081
> 
> 
> Jun 01 13:37:13 [44808960] -> __osm_sa_mad_ctrl_process: [
> Jun 01 13:37:13 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD.
> Jun 01 13:37:13 [44808960] -> __osm_sa_mad_ctrl_process: ]
> Jun 01 13:37:13 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: ]
> Jun 01 13:37:14 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:37:14 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:37:14 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:37:24 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:37:24 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:37:24 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:37:34 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:37:34 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:37:34 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:37:44 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:37:44 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:37:44 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:37:54 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:37:54 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:37:54 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:38:04 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:38:04 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:38:04 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:38:14 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:38:14 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:38:14 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:38:24 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:38:24 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:38:24 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:38:34 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:38:34 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:38:34 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:38:44 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:38:44 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:38:44 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:38:54 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:38:54 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:38:54 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:39:04 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:39:04 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:39:04 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:39:14 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:39:14 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:39:14 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:39:24 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:39:24 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:39:24 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:39:34 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:39:34 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:39:34 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:39:44 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:39:44 [44007960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_SWEEP_HEAVY_SELF.
> Jun 01 13:39:44 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:39:54 [44007960] -> osm_state_mgr_process: [
> Jun 01 13:39:54 [44007960] -> osm_state_mgr_process: ]
> Jun 01 13:39:56 [AAEE0B60] -> osm_vl15_shutdown: [
> Jun 01 13:39:56 [AAEE0B60] -> osm_vl15_shutdown: ]
> Jun 01 13:39:56 [AAEE0B60] -> osm_vendor_set_sm: [
> Jun 01 13:39:56 [AAEE0B60] -> osm_vendor_set_sm: ]
> Jun 01 13:39:56 [AAEE0B60] -> clear_madw: [
> Jun 01 13:39:56 [AAEE0B60] -> clear_madw: ]
> Jun 01 13:39:56 [43005960] -> osm_physp_share_pkey: ]
> Jun 01 13:39:56 [43005960] -> osm_port_share_pkey: ]
> Jun 01 13:39:56 [43005960] -> __match_notice_to_inf_rec: Mismatch by Pkey
> Jun 01 13:39:56 [43005960] -> __match_notice_to_inf_rec: ]
> Jun 01 13:39:56 [AAEE0B60] -> osm_sm_destroy: [
> Jun 01 13:39:56 [44007960] -> __osm_sm_sweeper: Off schedule sweep signalled.
> Jun 01 13:39:56 [44007960] -> __osm_sm_sweeper: ]

It's past any of the OpenIB vendor layer code involved in the shutdown
but appears to be handling an incoming trap and trying to see whether it
matches any event subscriptions (SA InformInfo) based on the stack
trace. Since completing the call to osm_vendor_delete unregisters all
the MAD agents, it should not be receiving any traps after that so this
must be internally queued. 

Perhaps we have a race here with osm_opensm_destroy which is invoked
late in the OpenSM shutdown:
osm_opensm.c::osm_opensm_destroy does:
  osm_subn_destroy( &p_osm->subn );
  osm_sm_destroy( &p_osm->sm );
  osm_sa_destroy( &p_osm->sa );

It is clearly in osm_sm_destroy when this occurs and another thread is
handling the incoming trap.

-- Hal







More information about the general mailing list