[openib-general] opensm segfault?
Troy Benjegerdes
troy at scl.ameslab.gov
Tue May 16 12:41:08 PDT 2006
I got this after an indeterminate amount of time running opensm..
(gdb) bt
#0 0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, p_src=0x0,
count=64) at cl_memory_osd.c:87
#1 0x0000000000415053 in osm_pkey_tbl_sync_new_blocks (
p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127
#2 0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40)
at osm_pkey_mgr.c:407
#3 0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
signal=3)
at osm_state_mgr.c:2243
#4 0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback (
context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
#5 0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0)
at cl_dispatcher.c:108
#6 0x00002b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268)
at cl_threadpool.c:78
#7 0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
cl_thread.c:61
#8 0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
#9 0x00002b90b12c8273 in clone () from /lib/libc.so.6
And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
just seems like excessive uneeded abstraction.
I'm running opensm from subversion rev 7091..
May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn
6251:7091M
the only local changes are as follows:
troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff
Index: osm/opensm/osm_port_info_rcv.c
===================================================================
--- osm/opensm/osm_port_info_rcv.c (revision 7091)
+++ osm/opensm/osm_port_info_rcv.c (working copy)
@@ -469,9 +469,14 @@
goto Exit;
}
+#if 0
/* Check for IBM eHCA firmware defect in reporting partition
* enforcement cap */
if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) ==
IBM_VENDOR_ID)
p_switch->switch_info.enforce_cap = 0;
+#endif
+ /* Check for busted divergenet switch on ameslab network */
+ if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e0000000152)
+ p_switch->switch_info.enforce_cap = 0;
/* Bail out if this is a switch with no partition enforcement
* capability */
if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
More information about the general
mailing list