[openib-general] opensm segfault?

Eitan Zahavi eitan at mellanox.co.il
Tue May 16 23:10:11 PDT 2006


cl_memcpy  should have some debug capabilities on top of memcpy ...
cl memory management provide means to track all memory allocations, etc.

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: openib-general-bounces at openib.org [mailto:openib-general-
> bounces at openib.org] On Behalf Of Sasha Khapyorsky
> Sent: Wednesday, May 17, 2006 2:11 AM
> To: Troy Benjegerdes
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] opensm segfault?
> 
> Hi Troy,
> 
> On 14:41 Tue 16 May     , Troy Benjegerdes wrote:
> > I got this after an indeterminate amount of time running opensm..
> 
> May this be reproducible? Or it is completely random failure?
> 
> > (gdb) bt
> > #0  0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850,
p_src=0x0,
> >     count=64) at cl_memory_osd.c:87
> > #1  0x0000000000415053 in osm_pkey_tbl_sync_new_blocks (
> >     p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127
> > #2  0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40)
> >     at osm_pkey_mgr.c:407
> > #3  0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
> > signal=3)
> >     at osm_state_mgr.c:2243
> > #4  0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback (
> >     context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
> > #5  0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0)
> >     at cl_dispatcher.c:108
> > #6  0x00002b90b0dc1ca3 in __cl_thread_pool_routine
(context=0x583268)
> >     at cl_threadpool.c:78
> > #7  0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
> > cl_thread.c:61
> > #8  0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
> > #9  0x00002b90b12c8273 in clone () from /lib/libc.so.6
> >
> >
> >
> > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
> > just seems like excessive uneeded abstraction.
> 
> Absolutely agree with you.
> 
> Sasha.
> 
> > I'm running opensm from subversion rev 7091..
> >
> > May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn
> > 6251:7091M
> >
> > the only local changes are as follows:
> >
> > troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff
> > Index: osm/opensm/osm_port_info_rcv.c
> > ===================================================================
> > --- osm/opensm/osm_port_info_rcv.c      (revision 7091)
> > +++ osm/opensm/osm_port_info_rcv.c      (working copy)
> > @@ -469,9 +469,14 @@
> >        goto Exit;
> >      }
> >
> > +#if 0
> >      /* Check for IBM eHCA firmware defect in reporting partition
> >  * enforcement cap */
> >      if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info))
==
> > IBM_VENDOR_ID)
> >        p_switch->switch_info.enforce_cap = 0;
> > +#endif
> > +    /* Check for busted divergenet switch on ameslab network */
> > +    if (cl_ntoh64(p_node->node_info.node_guid) ==
0x00084e0000000152)
> > +           p_switch->switch_info.enforce_cap = 0;
> >
> >      /* Bail out if this is a switch with no partition enforcement
> >  * capability */
> >      if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list