[openib-general] Reuse pd amd mr
Tang, Changqing
changquing.tang at hp.com
Mon Sep 18 09:58:56 PDT 2006
>
>'retries exceeded' means that the transport retry count was
>exceeded, so most likely your timeout is set too low.
Is there a common recommended value for this timeout ? I use 18, which
represents 1 second.
>
>Without seeing your code, I couldn't begin to say why you
>don't see a send completion. If you are absolutely positive
>that you post a send and you never see a completion for that
>send, then I guess it is a firmware or hardware problem.
It is very hard to reproduce this error with standalone code. I use
HP-Mpi and need 8 ranks, at least 4 nodes with
2 cards on each node, and just one of our hundred test code can catch
this error, and it is on MPI_Scatterv
Operation.
--CQ
>
> - R.
>
More information about the general
mailing list