[openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

Sean Hefty mshefty at ichips.intel.com
Thu Sep 28 09:09:58 PDT 2006


Or Gerlitz wrote:
> My understanding is that without this patch the side that sends the DREQ 
> would do few DREQ resends as of the "firsts" DREPs being lost and no 
> DREPs sent once the id at the peer side left the timewait state, correct?

This is correct.  Note that the number of DREQ retries was changed to 15 now.

> Can you please share what were the implications with intel MPI running a 
> 64 nodes (128 ranks?) job? was the issue here just making the ***job 
> termination time*** bigger?

The job termination time was taking about a minute waiting for the DREQ to 
timeout.  When running a series of tests, this becomes a fairly large issue.

- Sean




More information about the general mailing list