[openib-general] Re: opensm and SIGINT

Viswanath Krishnamurthy viswa.krish at gmail.com
Fri Sep 23 10:43:00 PDT 2005


More information,

The test case is as follows

1. Start opensm in verbose mode (-V)
2. Ping remote node
3. osmtest -f c
4. osmtest -f a
5. pkill -9 opensm
6. Repeat over

Out of about 2500 iterations, 143 osmtest failed. Keep in mind, only Step 4
failed. Step 3 which is inventory file creation *never* failed. (I think
inventory file creation also talks to SA right ?)

-Viswa



On 23 Sep 2005 12:54:56 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
>
> Hi Eitan,
>
> On Fri, 2005-09-23 at 12:19, Eitan Zahavi wrote:
> > Hi Hal, Viswa,
> >
> > Sorry I'm joining late on this thread due to the weekend (which starts
> > here on Friday ending Saturday night).
> > Is there any conclusion on this one?
>
> No.
>
> > The only log I have seen was from osmtest failing to send a MAD.
>
> True.
>
> > Looks like a umad issue?
>
> Not sure why you say that. There are other possibilities I'm aware of
> here:
>
> Note that that failed sent MAD is one which has a response expected so
> this means that the response was not received. It also goes through the
> transmit retry strategy (I could see this on the SA side). So the only
> thing I can say at this point is that for some reason, the response does
> not make it back from the SA to the SA client (osmtest). That's where
> this one is right now.
>
> -- Hal
>
> > Eitan
> >
> > Hal Rosenstock wrote:
> > > Hi again Viswa,
> > >
> > > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote:
> > >
> > >>Hi Viswa,
> > >>
> > >>On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote:
> > >>
> > >>>Currently opensm traps SIGINT. There was some discussion to remove
> > >
> > > it.
> > >
> > >>>I have currently running some tests on opensm
> > >>>by killing (SIGKILL) and restarting opensm. So far I ahve not found
> > >>>any resource leak issues. Is ther a plan to remove that
> > >>>signal handler. Ideally it should not exist.
> > >>
> > >>Eitan stated that this was historical in nature for gen1 drivers which
> > >>had resource tracking problems: "if OpenSM left without cleaning up
> > >
> > > all
> > >
> > >>used resources (like MAD buffers and UD-AVs), the driver oops'ed."
> > >>
> > >>I think that (eliminating the handler for SIGINT) can at least be done
> > >>for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor
> > >>layers for starters. I will experiment with gen2 and let you know.
> > >
> > >
> > > Does the patch below do what you want ? Can you try it ?
> > >
> > > -- Hal
> > >
> > > Index: opensm/osm_opensm.c
> > > ===================================================================
> > > --- opensm/osm_opensm.c (revision 3513)
> > > +++ opensm/osm_opensm.c (working copy)
> > > @@ -182,7 +182,9 @@ osm_reg_sig_handler(
> > > IN osm_opensm_t * const p_osm )
> > > {
> > > __p_osm_to_signal = p_osm;
> > > +#ifndef OSM_VENDOR_INTF_OPENIB
> > > cl_reg_sig_hdl( SIGINT, __sig_handler );
> > > +#endif
> > > cl_reg_sig_hdl( SIGTERM, __sig_handler );
> > > cl_reg_sig_hdl( SIGHUP, __sig_handler );
> > > osm_exit_flag = 0;
> > >
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050923/ce39ca8d/attachment.html>


More information about the general mailing list