[openib-general] RE: [PATCH] OpenSM: Extend default transaction timeout from 100 msec to 1 second

Eitan Zahavi eitan at mellanox.co.il
Tue Dec 20 13:36:12 PST 2005


Hi Hal,

In a way using parallel MADs sends (maxsmp > 1) would help.
But if you count the number of packets that should be sent to every port
(NodeInfo, PortInfo, SwitchInfo?, PKey*2, SL2VL, VLArb ....)
Even a single bad port will slow down the sweep significantly

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, December 20, 2005 11:23 PM
> To: Eitan Zahavi; Yael Kalka
> Cc: openib-general at openib.org
> Subject: RE: [PATCH] OpenSM: Extend default transaction timeout from
100 msec to
> 1 second
> 
> Hi Eitan,
> 
> Yes, I saw these failures as I mentioned in the original email.
Another easy way to see
> this is to turn on logging on a slow NFS server.
> 
> Also, wouldn't increasing maxsmps ameliorate this to some degree so
maybe that
> should be done at the same time ?
> 
> -- Hal
> 
> ________________________________
> 
> From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
> Sent: Tue 12/20/2005 4:27 PM
> To: Hal Rosenstock; Yael Kalka
> Cc: openib-general at openib.org
> Subject: RE: [PATCH] OpenSM: Extend default transaction timeout from
100 msec to
> 1 second
> 
> 
> 
> Hi Hal,
> 
> The effect is basically a slowdown in case of non responding or lost
> packets.
> With 1sec timeout - up to 4sec per lost transaction are added to the
SM
> bringup time.
> 
> In many clusters I have seen a 100msec was enough - but I guess you
have
> actually have seen such failures.
> 
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Tuesday, December 20, 2005 3:38 PM
> > To: Yael Kalka; Eitan Zahavi
> > Cc: openib-general at openib.org
> > Subject: [PATCH] OpenSM: Extend default transaction timeout from 100
> msec to 1
> > second
> >
> > OpenSM: Extend default transaction timeout from 100 msec to 1
second.
> >
> > With the advent of long distance IB and software SMAs, 100 msec is
no
> > longer adaquete as a default transaction timeout. Increase this to 1
> > second which so that the default is sufficient in most common cases.
> >
> > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> >
> > Index: include/opensm/osm_base.h
> > ===================================================================
> > --- include/opensm/osm_base.h (revision 4549)
> > +++ include/opensm/osm_base.h (working copy)
> > @@ -246,7 +246,7 @@ BEGIN_C_DECLS
> >  *
> >  * SYNOPSIS
> >  */
> > -#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 100
> > +#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 1000
> >  /***********/
> >
> >  /****d* OpenSM: Base/OSM_DEFAULT_SUBNET_TIMEOUT
> > Index: opensm/main.c
> > ===================================================================
> > --- opensm/main.c     (revision 4549)
> > +++ opensm/main.c     (working copy)
> > @@ -153,7 +153,7 @@ show_usage(void)
> >            "          used for transaction timeouts.\n"
> >            "          Specifying -t 0 disables timeouts.\n"
> >            "          Without -t, OpenSM defaults to a timeout value
> of\n"
> > -          "          100 milliseconds.\n\n" );
> > +          "          1 second (1000 milliseconds).\n\n" );
> >    printf( "-maxsmps <number>\n"
> >            "          This option specifies the number of VL15 SMP
> MADs\n"
> >            "          allowed on the wire at any one time.\n"




More information about the general mailing list