[openib-general] RE: [PATCH] OpenSM: Extend default transaction timeout from 100 msec to 1 second
Eitan Zahavi
eitan at mellanox.co.il
Tue Dec 20 13:36:12 PST 2005
Hi Hal,
In a way using parallel MADs sends (maxsmp > 1) would help.
But if you count the number of packets that should be sent to every port
(NodeInfo, PortInfo, SwitchInfo?, PKey*2, SL2VL, VLArb ....)
Even a single bad port will slow down the sweep significantly
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, December 20, 2005 11:23 PM
> To: Eitan Zahavi; Yael Kalka
> Cc: openib-general at openib.org
> Subject: RE: [PATCH] OpenSM: Extend default transaction timeout from
100 msec to
> 1 second
>
> Hi Eitan,
>
> Yes, I saw these failures as I mentioned in the original email.
Another easy way to see
> this is to turn on logging on a slow NFS server.
>
> Also, wouldn't increasing maxsmps ameliorate this to some degree so
maybe that
> should be done at the same time ?
>
> -- Hal
>
> ________________________________
>
> From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
> Sent: Tue 12/20/2005 4:27 PM
> To: Hal Rosenstock; Yael Kalka
> Cc: openib-general at openib.org
> Subject: RE: [PATCH] OpenSM: Extend default transaction timeout from
100 msec to
> 1 second
>
>
>
> Hi Hal,
>
> The effect is basically a slowdown in case of non responding or lost
> packets.
> With 1sec timeout - up to 4sec per lost transaction are added to the
SM
> bringup time.
>
> In many clusters I have seen a 100msec was enough - but I guess you
have
> actually have seen such failures.
>
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
>
>
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Tuesday, December 20, 2005 3:38 PM
> > To: Yael Kalka; Eitan Zahavi
> > Cc: openib-general at openib.org
> > Subject: [PATCH] OpenSM: Extend default transaction timeout from 100
> msec to 1
> > second
> >
> > OpenSM: Extend default transaction timeout from 100 msec to 1
second.
> >
> > With the advent of long distance IB and software SMAs, 100 msec is
no
> > longer adaquete as a default transaction timeout. Increase this to 1
> > second which so that the default is sufficient in most common cases.
> >
> > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> >
> > Index: include/opensm/osm_base.h
> > ===================================================================
> > --- include/opensm/osm_base.h (revision 4549)
> > +++ include/opensm/osm_base.h (working copy)
> > @@ -246,7 +246,7 @@ BEGIN_C_DECLS
> > *
> > * SYNOPSIS
> > */
> > -#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 100
> > +#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 1000
> > /***********/
> >
> > /****d* OpenSM: Base/OSM_DEFAULT_SUBNET_TIMEOUT
> > Index: opensm/main.c
> > ===================================================================
> > --- opensm/main.c (revision 4549)
> > +++ opensm/main.c (working copy)
> > @@ -153,7 +153,7 @@ show_usage(void)
> > " used for transaction timeouts.\n"
> > " Specifying -t 0 disables timeouts.\n"
> > " Without -t, OpenSM defaults to a timeout value
> of\n"
> > - " 100 milliseconds.\n\n" );
> > + " 1 second (1000 milliseconds).\n\n" );
> > printf( "-maxsmps <number>\n"
> > " This option specifies the number of VL15 SMP
> MADs\n"
> > " allowed on the wire at any one time.\n"
More information about the general
mailing list