[openib-general] Re: openib segfaults when openib is not load ed

Hal Rosenstock halr at voltaire.com
Wed Nov 2 07:29:15 PST 2005


On Wed, 2005-11-02 at 09:50, Eitan Zahavi wrote:
> Hi Hal,
>
> Yael is working on the exact same problem. She is probably going to
> complete it tomorrow.
>
> The issue was both the vl15 cl_unregister but we are also facing some
> issues as the umad receiver never exists.

Yes, I've also been working on making the umad receiver exit. This has also
been a lower priority and I don't have a completed solution yet.

-- Hal

>  When MADs are arriving after the dispatcher is destroyed they cause a
> segfault.
>
> Hope it will be all fixed by the weekend.
>
> EZ
>
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
>
>
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Wednesday, November 02, 2005 4:20 PM
> > To: Michael S. Tsirkin
> > Cc: openib-general at openib.org
> > Subject: [openib-general] Re: openib segfaults when openib is not
> loaded
> >
> > On Wed, 2005-11-02 at 09:14, Michael S. Tsirkin wrote:
> > > Hi!
> > > If I try to load opensm without loading any of openib modules,
> > > opensm crashes on exit.
> > > Has anyone else seen this?
> > >
> > > # /usr/local/bin/opensm
> > > -------------------------------------------------
> > > OpenSM Rev:openib-1.1.0
> > > Command Line Arguments:
> > >  Log File: /var/log/osm.log
> > > -------------------------------------------------
> > > OpenSM Rev:openib-1.1.0
> > >
> > > ibwarn: [8954] umad_init: can't read ABI version from
> > /sys/class/infiniband_mad/abi_version (No such file or directory):
> is ib_umad module
> > loaded?
> > >
> > > Error from osm_vendor_get_all_port_attr (ffffffff)
> > > Error: Could not get port guid
> > > Exiting SM
> > >
> > > Segmentation fault (core dumped)
> >
> > Yes, this seg fault is caused due to the following:
> > osm_opensm_destroy shutdowns the dispatcher and subsequent to this
> > osm_vl15_destroy attempts to unregister with the dispatcher
> (although
> > this has already been done).
> >
> > osm_opensm.c::osm_opensm_destroy
> >
> >    /* shut down the dispatcher - so no new messages cross */
> >    cl_disp_shutdown( &p_osm->disp );
> >
> >    /* cleanup all messages on VL15 fifo that were not sent yet */
> >    osm_vl15_shutdown( &p_osm->vl15, &p_osm->mad_pool );
> >
> >    /* lock the whole thing so we do not get any requests etc */
> >    cl_plock_excl_acquire( &p_osm->lock );
> >
> >    /* do the destruction in reverse order as init */
> >    updn_destroy( p_osm->p_updn_ucast_routing );
> >    osm_sa_destroy( &p_osm->sa );
> >    osm_sm_destroy( &p_osm->sm );
> >    osm_db_destroy( &p_osm->db );
> >    osm_vl15_destroy( &p_osm->vl15, &p_osm->mad_pool );
> >
> >
> > My workaround has been to remove this from
> > osm_vl15intf.c::osm_vl15_destroy but I'm not sure this is the best
> long
> > term fix as yet. I hadn't searched out whether there were other
> paths
> > that were different from this flow.
> >
> > This seems lower priority to me than some other issues I'm still
> sorting
> > through but I will get back to this unless someone else gets to it
> first
> > or thinks that the workaround I have should be made permanent.
> >
> > -- Hal
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>





More information about the general mailing list