[openib-general] opensm segfault?

Hal Rosenstock halr at voltaire.com
Wed May 17 03:49:56 PDT 2006


On Wed, 2006-05-17 at 02:10, Eitan Zahavi wrote:
> cl_memcpy  should have some debug capabilities on top of memcpy .

I don't see any. Did I miss something ?
..
> cl memory management provide means to track all memory allocations, etc.

Yes, there is extra memory tracking code for malloc and free. This is a
separable item in my mind right now.

-- Hal

> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -----Original Message-----
> > From: openib-general-bounces at openib.org [mailto:openib-general-
> > bounces at openib.org] On Behalf Of Sasha Khapyorsky
> > Sent: Wednesday, May 17, 2006 2:11 AM
> > To: Troy Benjegerdes
> > Cc: openib-general at openib.org
> > Subject: Re: [openib-general] opensm segfault?
> > 
> > Hi Troy,
> > 
> > On 14:41 Tue 16 May     , Troy Benjegerdes wrote:
> > > I got this after an indeterminate amount of time running opensm..
> > 
> > May this be reproducible? Or it is completely random failure?
> > 
> > > (gdb) bt
> > > #0  0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850,
> p_src=0x0,
> > >     count=64) at cl_memory_osd.c:87
> > > #1  0x0000000000415053 in osm_pkey_tbl_sync_new_blocks (
> > >     p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127
> > > #2  0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40)
> > >     at osm_pkey_mgr.c:407
> > > #3  0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
> > > signal=3)
> > >     at osm_state_mgr.c:2243
> > > #4  0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback (
> > >     context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
> > > #5  0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0)
> > >     at cl_dispatcher.c:108
> > > #6  0x00002b90b0dc1ca3 in __cl_thread_pool_routine
> (context=0x583268)
> > >     at cl_threadpool.c:78
> > > #7  0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
> > > cl_thread.c:61
> > > #8  0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
> > > #9  0x00002b90b12c8273 in clone () from /lib/libc.so.6
> > >
> > >
> > >
> > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
> > > just seems like excessive uneeded abstraction.
> > 
> > Absolutely agree with you.
> > 
> > Sasha.
> > 
> > > I'm running opensm from subversion rev 7091..
> > >
> > > May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn
> > > 6251:7091M
> > >
> > > the only local changes are as follows:
> > >
> > > troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff
> > > Index: osm/opensm/osm_port_info_rcv.c
> > > ===================================================================
> > > --- osm/opensm/osm_port_info_rcv.c      (revision 7091)
> > > +++ osm/opensm/osm_port_info_rcv.c      (working copy)
> > > @@ -469,9 +469,14 @@
> > >        goto Exit;
> > >      }
> > >
> > > +#if 0
> > >      /* Check for IBM eHCA firmware defect in reporting partition
> > >  * enforcement cap */
> > >      if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info))
> ==
> > > IBM_VENDOR_ID)
> > >        p_switch->switch_info.enforce_cap = 0;
> > > +#endif
> > > +    /* Check for busted divergenet switch on ameslab network */
> > > +    if (cl_ntoh64(p_node->node_info.node_guid) ==
> 0x00084e0000000152)
> > > +           p_switch->switch_info.enforce_cap = 0;
> > >
> > >      /* Bail out if this is a switch with no partition enforcement
> > >  * capability */
> > >      if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list