[openib-general] progress...but opensm crashed

shaharf shaharf at voltaire.com
Thu Dec 23 23:30:54 PST 2004


Hi Tom,
  It seems that one of nodes in your subnet registered itself to be informed on traps. This is one of the rare cases that require GRH headers which are not really supported yet.
 
I didn't think GRH has a high priority, but as I see that it is really used, I will have to deal with that sooner then I thought.
 
Thanks for the information. I would be glad to have your osm.log file (probably in /tmp) if you have run it with -V. This could let me check my theory.
 
Shahar

________________________________

From: openib-general-bounces at openib.org on behalf of Tom Duffy
Sent: Fri 12/24/2004 1:16 AM
To: openib-general at openib.org
Subject: [openib-general] progress...but opensm crashed



The good news was that my port went to active on the node running opensm
(way to go Shahar!).

The bad news is that there was no xmas miracle when I brought up another
node on the subnet.

(gdb) bt
#0  0x0000002a95994f1e in stack_dump () at stack.c:33
#1  0x0000002a959953e1 in handler (x=11) at stack.c:112
#2  <signal handler called>
#3  0x0000000000410121 in osm_port_share_pkey (p_log=0x559af8,
    p_port_1=0x5913c0, p_port_2=0x0) at osm_port.h:1616
#4  0x0000000000408a5c in __match_notice_to_inf_rec (p_list_item=0x559b00,
    context=0x0) at osm_inform.c:599
#5  0x0000002a9588b044 in cl_qlist_apply_func (p_list=0x557ce0,
    pfn_func=0x4086d9 <__match_notice_to_inf_rec>, context=0x43004f90)
    at cl_list.c:387
#6  0x0000000000408c8a in osm_report_notice (p_log=0x559af8, p_subn=0x557940,
    p_ntc=0x430050d0) at osm_inform.c:705
#7  0x000000000042bfa2 in __osm_trap_rcv_process_request (p_rcv=0x558848,
    p_madw=0x5b95a8) at osm_trap_rcv.c:681
#8  0x000000000042c128 in osm_trap_rcv_process (p_rcv=0x558848,
    p_madw=0x5b95a8) at osm_trap_rcv.c:759
#9  0x000000000042c158 in __osm_trap_rcv_ctrl_disp_callback (context=0x559b00,
    p_data=0x0) at osm_trap_rcv_ctrl.c:99
#10 0x00000000004035ce in __cl_disp_worker (context=0x559b00)
    at cl_dispatcher.c:138
#11 0x0000002a9588f5ef in __cl_thread_pool_routine (context=0x559b00)
    at cl_threadpool.c:111
#12 0x0000002a9588f4be in __cl_thread_wrapper (arg=0x0) at cl_thread.c:94
---Type <return> to continue, or q <return> to quit---
#13 0x0000002a9567213a in start_thread () from /lib64/tls/libpthread.so.0
#14 0x0000002a95ce33c3 in clone () from /lib64/tls/libc.so.6
#15 0x0000000000000000 in ?? ()

--
Tom Duffy <tduffy at sun.com>





More information about the general mailing list