[openib-general] Re: openib segfaults when openib is not load ed

Eitan Zahavi eitan at mellanox.co.il
Wed Nov 2 06:50:57 PST 2005


Hi Hal,

Yael is working on the exact same problem. She is probably going to complete
it tomorrow.

The issue was both the vl15 cl_unregister but we are also facing some issues
as the umad receiver never exists. When MADs are arriving after the
dispatcher is destroyed they cause a segfault. 

Hope it will be all fixed by the weekend.

EZ

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Wednesday, November 02, 2005 4:20 PM
> To: Michael S. Tsirkin
> Cc: openib-general at openib.org
> Subject: [openib-general] Re: openib segfaults when openib is not loaded
> 
> On Wed, 2005-11-02 at 09:14, Michael S. Tsirkin wrote:
> > Hi!
> > If I try to load opensm without loading any of openib modules,
> > opensm crashes on exit.
> > Has anyone else seen this?
> >
> > # /usr/local/bin/opensm
> > -------------------------------------------------
> > OpenSM Rev:openib-1.1.0
> > Command Line Arguments:
> >  Log File: /var/log/osm.log
> > -------------------------------------------------
> > OpenSM Rev:openib-1.1.0
> >
> > ibwarn: [8954] umad_init: can't read ABI version from
> /sys/class/infiniband_mad/abi_version (No such file or directory): is
ib_umad module
> loaded?
> >
> > Error from osm_vendor_get_all_port_attr (ffffffff)
> > Error: Could not get port guid
> > Exiting SM
> >
> > Segmentation fault (core dumped)
> 
> Yes, this seg fault is caused due to the following:
> osm_opensm_destroy shutdowns the dispatcher and subsequent to this
> osm_vl15_destroy attempts to unregister with the dispatcher (although
> this has already been done).
> 
> osm_opensm.c::osm_opensm_destroy
> 
>    /* shut down the dispatcher - so no new messages cross */
>    cl_disp_shutdown( &p_osm->disp );
> 
>    /* cleanup all messages on VL15 fifo that were not sent yet */
>    osm_vl15_shutdown( &p_osm->vl15, &p_osm->mad_pool );
> 
>    /* lock the whole thing so we do not get any requests etc */
>    cl_plock_excl_acquire( &p_osm->lock );
> 
>    /* do the destruction in reverse order as init */
>    updn_destroy( p_osm->p_updn_ucast_routing );
>    osm_sa_destroy( &p_osm->sa );
>    osm_sm_destroy( &p_osm->sm );
>    osm_db_destroy( &p_osm->db );
>    osm_vl15_destroy( &p_osm->vl15, &p_osm->mad_pool );
> 
> 
> My workaround has been to remove this from
> osm_vl15intf.c::osm_vl15_destroy but I'm not sure this is the best long
> term fix as yet. I hadn't searched out whether there were other paths
> that were different from this flow.
> 
> This seems lower priority to me than some other issues I'm still sorting
> through but I will get back to this unless someone else gets to it first
> or thinks that the workaround I have should be made permanent.
> 
> -- Hal
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051102/ea975cc2/attachment.html>


More information about the general mailing list