[openib-general] VAPI_RETRY_EXC_ERR

Tziporet Koren tziporet at mellanox.co.il
Tue Nov 9 06:43:01 PST 2004


There can be several problems:
- The retry count is too small - try to put max number - 7
- Maybe the timeout is too small - so the HCA start to perform retry too
much - try to enlarge it to 21
- Can be that the PSN between two sides is not synchronized
- The link fail
- The QP in the other side was closed or moved to error
 
In case this error occurs at the beginning of the application then it can
indicate that the QP configuration is wrong.
 
Tziporet

-----Original Message-----
From: Sreenivasulu Pulichintala [mailto:sreenivasulu at topspin.com]
Sent: Tuesday, November 09, 2004 12:49 PM
To: openib-general at openib.org
Subject: RE: [openib-general] VAPI_RETRY_EXC_ERR



The corresponding IB maro is - IB_COMP_RETRY_EXC_ERR

 

-----Original Message-----
From: Sreenivasulu Pulichintala 
Sent: Tuesday, November 09, 2004 3:56 PM
To: openib-general at openib.org
Subject: [openib-general] VAPI_RETRY_EXC_ERR

 

HI,

 

I use MPICH 1.2.5 and MVAPICH 0.9.2 stack and when I run some of my fortran
applications, some times my application crashes producing the following
error -

 

===

Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor code=81
mpi_latency: mpid/ch_vapi/viacheck.c:2109: viutil_spinandwaitcq: Assertion
`sc->status == VAPI_SUCCESS' failed.
Timeout alarm signaled^M
Cleaning up all processes ...done.^M
Killed by signal 15.^M^M
===
 
In what possible cases I get this error? Is it because of RESYNC?
 
Any help in this regard is highly appreciated.
 
Thanks
Sree
 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041109/71bcc0ed/attachment.html>


More information about the general mailing list