[openib-general] RE: [PATCH] OpenSM: Extend default transaction timeout from 100 msec to 1 second

Eitan Zahavi eitan at mellanox.co.il
Tue Dec 20 13:27:07 PST 2005


Hi Hal,

The effect is basically a slowdown in case of non responding or lost
packets.
With 1sec timeout - up to 4sec per lost transaction are added to the SM
bringup time.

In many clusters I have seen a 100msec was enough - but I guess you have
actually have seen such failures.

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, December 20, 2005 3:38 PM
> To: Yael Kalka; Eitan Zahavi
> Cc: openib-general at openib.org
> Subject: [PATCH] OpenSM: Extend default transaction timeout from 100
msec to 1
> second
> 
> OpenSM: Extend default transaction timeout from 100 msec to 1 second.
> 
> With the advent of long distance IB and software SMAs, 100 msec is no
> longer adaquete as a default transaction timeout. Increase this to 1
> second which so that the default is sufficient in most common cases.
> 
> Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> 
> Index: include/opensm/osm_base.h
> ===================================================================
> --- include/opensm/osm_base.h	(revision 4549)
> +++ include/opensm/osm_base.h	(working copy)
> @@ -246,7 +246,7 @@ BEGIN_C_DECLS
>  *
>  * SYNOPSIS
>  */
> -#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 100
> +#define OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC 1000
>  /***********/
> 
>  /****d* OpenSM: Base/OSM_DEFAULT_SUBNET_TIMEOUT
> Index: opensm/main.c
> ===================================================================
> --- opensm/main.c	(revision 4549)
> +++ opensm/main.c	(working copy)
> @@ -153,7 +153,7 @@ show_usage(void)
>            "          used for transaction timeouts.\n"
>            "          Specifying -t 0 disables timeouts.\n"
>            "          Without -t, OpenSM defaults to a timeout value
of\n"
> -          "          100 milliseconds.\n\n" );
> +          "          1 second (1000 milliseconds).\n\n" );
>    printf( "-maxsmps <number>\n"
>            "          This option specifies the number of VL15 SMP
MADs\n"
>            "          allowed on the wire at any one time.\n"




More information about the general mailing list