[openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ
Sean Hefty
mshefty at ichips.intel.com
Thu Sep 28 09:09:58 PDT 2006
Or Gerlitz wrote:
> My understanding is that without this patch the side that sends the DREQ
> would do few DREQ resends as of the "firsts" DREPs being lost and no
> DREPs sent once the id at the peer side left the timewait state, correct?
This is correct. Note that the number of DREQ retries was changed to 15 now.
> Can you please share what were the implications with intel MPI running a
> 64 nodes (128 ranks?) job? was the issue here just making the ***job
> termination time*** bigger?
The job termination time was taking about a minute waiting for the DREQ to
timeout. When running a series of tests, this becomes a fairly large issue.
- Sean
More information about the general
mailing list