[openib-general] Reuse pd amd mr

Tang, Changqing changquing.tang at hp.com
Mon Sep 18 09:58:56 PDT 2006


 

>
>'retries exceeded' means that the transport retry count was 
>exceeded, so most likely your timeout is set too low.

Is there a common recommended value for this timeout ? I use 18, which
represents 1 second.

>
>Without seeing your code, I couldn't begin to say why you 
>don't see a send completion.  If you are absolutely positive 
>that you post a send and you never see a completion for that 
>send, then I guess it is a firmware or hardware problem.

It is very hard to reproduce this error with standalone code. I use
HP-Mpi and need 8 ranks, at least 4 nodes with 
2 cards on each node, and just one of our hundred test code can catch
this error, and it is on MPI_Scatterv
Operation.

--CQ


>
> - R.
>




More information about the general mailing list