[openib-general] opensm segfault?
Hal Rosenstock
halr at voltaire.com
Tue May 16 13:00:32 PDT 2006
Hi Troy,
On Tue, 2006-05-16 at 15:41, Troy Benjegerdes wrote:
> I got this after an indeterminate amount of time running opensm..
>
>
> (gdb) bt
> #0 0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, p_src=0x0,
^^^^^^^^^
This is the problem. Not sure why yet.
> count=64) at cl_memory_osd.c:87
> #1 0x0000000000415053 in osm_pkey_tbl_sync_new_blocks (
> p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127
> #2 0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40)
> at osm_pkey_mgr.c:407
> #3 0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
> signal=3)
> at osm_state_mgr.c:2243
> #4 0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback (
> context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
> #5 0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0)
> at cl_dispatcher.c:108
> #6 0x00002b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268)
> at cl_threadpool.c:78
> #7 0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
> cl_thread.c:61
> #8 0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
> #9 0x00002b90b12c8273 in clone () from /lib/libc.so.6
>
>
>
> And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
> just seems like excessive uneeded abstraction.
It's part of the component library, which is an OS abstraction layer.
> I'm running opensm from subversion rev 7091..
>
> May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn
> 6251:7091M
>
> the only local changes are as follows:
>
> troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff
> Index: osm/opensm/osm_port_info_rcv.c
> ===================================================================
> --- osm/opensm/osm_port_info_rcv.c (revision 7091)
> +++ osm/opensm/osm_port_info_rcv.c (working copy)
> @@ -469,9 +469,14 @@
> goto Exit;
> }
>
> +#if 0
> /* Check for IBM eHCA firmware defect in reporting partition
> * enforcement cap */
> if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) ==
> IBM_VENDOR_ID)
> p_switch->switch_info.enforce_cap = 0;
> +#endif
> + /* Check for busted divergenet switch on ameslab network */
> + if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e0000000152)
> + p_switch->switch_info.enforce_cap = 0;
>
> /* Bail out if this is a switch with no partition enforcement
> * capability */
> if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
Yes, that's fine.
-- Hal
More information about the general
mailing list