[openib-general] VAPI_RETRY_EXC_ERR

Libor Michalek libor at topspin.com
Tue Nov 9 09:55:13 PST 2004


On Tue, Nov 09, 2004 at 04:19:17PM +0530, Sreenivasulu Pulichintala wrote:

> -----Original Message-----
> From: Sreenivasulu Pulichintala 
> Sent: Tuesday, November 09, 2004 3:56 PM
> To: openib-general at openib.org
> Subject: [openib-general] VAPI_RETRY_EXC_ERR
> 
> HI,
> 
> I use MPICH 1.2.5 and MVAPICH 0.9.2 stack and when I run some of my
> fortran applications, some times my application crashes producing the
> following error -
>
> Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor code=81

  Of the possible issues that Tziporet lists, the most likely problem
with MVAPICH 0.9.2 is that the local ack timeout is too small for either
large or blocking clusters. It is currently set to 10 (DEFAULT_ACK_TIMEOUT)
which translates to 4 milliseconds. (IBTA spec section 9.9.2) I would
try a value such as 15 or 20... 

  Also the retry counter is set using the define DEFAULT_RETRY_COUNT in
the MVAPICH source. It's currently set to 5.


-Libor



More information about the general mailing list