[openib-general] opensm segfault?

Troy Benjegerdes troy at scl.ameslab.gov
Tue May 16 12:41:08 PDT 2006


I got this after an indeterminate amount of time running opensm..


(gdb) bt
#0  0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, p_src=0x0,
    count=64) at cl_memory_osd.c:87
#1  0x0000000000415053 in osm_pkey_tbl_sync_new_blocks (
    p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127
#2  0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40)
    at osm_pkey_mgr.c:407
#3  0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
signal=3)
    at osm_state_mgr.c:2243
#4  0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback (
    context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
#5  0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0)
    at cl_dispatcher.c:108
#6  0x00002b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268)
    at cl_threadpool.c:78
#7  0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
cl_thread.c:61
#8  0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
#9  0x00002b90b12c8273 in clone () from /lib/libc.so.6



And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
just seems like excessive uneeded abstraction.

I'm running opensm from subversion rev 7091.. 

May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn
6251:7091M

the only local changes are as follows:

troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff
Index: osm/opensm/osm_port_info_rcv.c
===================================================================
--- osm/opensm/osm_port_info_rcv.c      (revision 7091)
+++ osm/opensm/osm_port_info_rcv.c      (working copy)
@@ -469,9 +469,14 @@
       goto Exit;
     }

+#if 0
     /* Check for IBM eHCA firmware defect in reporting partition
 * enforcement cap */
     if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) ==
IBM_VENDOR_ID)
       p_switch->switch_info.enforce_cap = 0;
+#endif
+    /* Check for busted divergenet switch on ameslab network */
+    if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e0000000152)
+           p_switch->switch_info.enforce_cap = 0;

     /* Bail out if this is a switch with no partition enforcement
 * capability */
     if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)



More information about the general mailing list