[openib-general] Re: openib segfaults when openib is not load ed
Eitan Zahavi
eitan at mellanox.co.il
Wed Nov 2 06:50:57 PST 2005
Hi Hal,
Yael is working on the exact same problem. She is probably going to complete
it tomorrow.
The issue was both the vl15 cl_unregister but we are also facing some issues
as the umad receiver never exists. When MADs are arriving after the
dispatcher is destroyed they cause a segfault.
Hope it will be all fixed by the weekend.
EZ
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Wednesday, November 02, 2005 4:20 PM
> To: Michael S. Tsirkin
> Cc: openib-general at openib.org
> Subject: [openib-general] Re: openib segfaults when openib is not loaded
>
> On Wed, 2005-11-02 at 09:14, Michael S. Tsirkin wrote:
> > Hi!
> > If I try to load opensm without loading any of openib modules,
> > opensm crashes on exit.
> > Has anyone else seen this?
> >
> > # /usr/local/bin/opensm
> > -------------------------------------------------
> > OpenSM Rev:openib-1.1.0
> > Command Line Arguments:
> > Log File: /var/log/osm.log
> > -------------------------------------------------
> > OpenSM Rev:openib-1.1.0
> >
> > ibwarn: [8954] umad_init: can't read ABI version from
> /sys/class/infiniband_mad/abi_version (No such file or directory): is
ib_umad module
> loaded?
> >
> > Error from osm_vendor_get_all_port_attr (ffffffff)
> > Error: Could not get port guid
> > Exiting SM
> >
> > Segmentation fault (core dumped)
>
> Yes, this seg fault is caused due to the following:
> osm_opensm_destroy shutdowns the dispatcher and subsequent to this
> osm_vl15_destroy attempts to unregister with the dispatcher (although
> this has already been done).
>
> osm_opensm.c::osm_opensm_destroy
>
> /* shut down the dispatcher - so no new messages cross */
> cl_disp_shutdown( &p_osm->disp );
>
> /* cleanup all messages on VL15 fifo that were not sent yet */
> osm_vl15_shutdown( &p_osm->vl15, &p_osm->mad_pool );
>
> /* lock the whole thing so we do not get any requests etc */
> cl_plock_excl_acquire( &p_osm->lock );
>
> /* do the destruction in reverse order as init */
> updn_destroy( p_osm->p_updn_ucast_routing );
> osm_sa_destroy( &p_osm->sa );
> osm_sm_destroy( &p_osm->sm );
> osm_db_destroy( &p_osm->db );
> osm_vl15_destroy( &p_osm->vl15, &p_osm->mad_pool );
>
>
> My workaround has been to remove this from
> osm_vl15intf.c::osm_vl15_destroy but I'm not sure this is the best long
> term fix as yet. I hadn't searched out whether there were other paths
> that were different from this flow.
>
> This seems lower priority to me than some other issues I'm still sorting
> through but I will get back to this unless someone else gets to it first
> or thinks that the workaround I have should be made permanent.
>
> -- Hal
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051102/ea975cc2/attachment.html>
More information about the general
mailing list