[openib-general] RE: [PATCH] OpenSM: Extend default transaction timeout from 100msec to 1 second

Hal Rosenstock halr at voltaire.com
Wed Jan 4 05:56:00 PST 2006


Hi Eitan,

On Wed, 2006-01-04 at 09:04, Eitan Zahavi wrote:
> Hi Hal
> 
> Regarding timeout 200msec is fine with me.

OK. We'll start there.

> Regarding maxsmps - I think it is better to have 1 SMP on the wire. 

Can you explain the inconsistency of this with the Mellanox default of 0
(infinite) ? Why is 1 outstanding SMP better ?

-- Hal

> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Wednesday, January 04, 2006 3:25 PM
> > To: Eitan Zahavi
> > Cc: Yael Kalka; openib-general at openib.org
> > Subject: RE: [PATCH] OpenSM: Extend default transaction timeout from
> 100msec to 1
> > second
> > 
> > Hi Eitan,
> > 
> > On Tue, 2005-12-20 at 16:27, Eitan Zahavi wrote:
> > > Hi Hal,
> > >
> > > The effect is basically a slowdown in case of non responding or lost
> > > packets.
> > > With 1sec timeout - up to 4sec per lost transaction are added to the
> SM
> > > bringup time.
> > >
> > > In many clusters I have seen a 100msec was enough - but I guess you
> have
> > > actually have seen such failures.
> > 
> > I see that the timeout is set to 200 msec (and maxsmps 0) in the
> > Mellanox OpenSM configuration. Do you have a problem with increasing
> the
> > default from 100 to 200 msec (and also changing default maxsmps to 0)
> ?
> > 
> > -- Hal
> > 
> > > Eitan Zahavi
> > > Design Technology Director
> > > Mellanox Technologies LTD
> > > Tel:+972-4-9097208
> > > Fax:+972-4-9593245
> > > P.O. Box 586 Yokneam 20692 ISRAEL
> > >
> > >
> > > > -----Original Message-----
> > > > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > > > Sent: Tuesday, December 20, 2005 3:38 PM
> > > > To: Yael Kalka; Eitan Zahavi
> > > > Cc: openib-general at openib.org
> > > > Subject: [PATCH] OpenSM: Extend default transaction timeout from
> 100
> > > msec to 1
> > > > second
> > > >
> > > > OpenSM: Extend default transaction timeout from 100 msec to 1
> second.
> > > >
> > > > With the advent of long distance IB and software SMAs, 100 msec is
> no
> > > > longer adaquete as a default transaction timeout. Increase this to
> 1
> > > > second which so that the default is sufficient in most common
> cases.
> > > >
> > > > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> > > >
> > > > Index: include/opensm/osm_base.h
> > > >
> ===================================================================
> > > > --- include/opensm/osm_base.h	(revision 4549)
> > > > +++ include/opensm/osm_base.h	(working copy)
> > > > @@ -246,7 +246,7 @@ BEGIN_C_DECLS
> > > >  *
> > > >  * SYNOPSIS
> > > >  */
> > > > -#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 100
> > > > +#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 1000
> > > >  /***********/
> > > >
> > > >  /****d* OpenSM: Base/OSM_DEFAULT_SUBNET_TIMEOUT
> > > > Index: opensm/main.c
> > > >
> ===================================================================
> > > > --- opensm/main.c	(revision 4549)
> > > > +++ opensm/main.c	(working copy)
> > > > @@ -153,7 +153,7 @@ show_usage(void)
> > > >            "          used for transaction timeouts.\n"
> > > >            "          Specifying -t 0 disables timeouts.\n"
> > > >            "          Without -t, OpenSM defaults to a timeout
> value
> > > of\n"
> > > > -          "          100 milliseconds.\n\n" );
> > > > +          "          1 second (1000 milliseconds).\n\n" );
> > > >    printf( "-maxsmps <number>\n"
> > > >            "          This option specifies the number of VL15 SMP
> > > MADs\n"
> > > >            "          allowed on the wire at any one time.\n"
> > >




More information about the general mailing list