[openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ
Arlin Davis
ardavis at ichips.intel.com
Mon Sep 25 11:37:53 PDT 2006
Arlin Davis wrote:
>Sean Hefty wrote:
>
>
>
>>Currently a DREP is only sent in response to a DREQ if a connection
>>has been found matching the DREQ, and it is in the proper state. Once
>>a DREP is sent, the local connection moves into timewait. Duplicate
>>DREQs received while in this state result in re-sending the DREP.
>>
>>However, it's likely that the local connection will enter and exit
>>timewait before the remote side times out a lost DREP and resends a DREQ.
>>There are a couple possible solutions to this. One is to increase how
>>long a connection remains in timewait, by multiplying its wait time by
>>max_cm_retries. This can greatly increase the timewait state before a QP
>>can be re-used when CM messages are not lost.
>>
>>An alternative is to send a DREP in response to a DREQ, even if a local
>>connection is not found, which is what this patch does.
>>
>>
>>
>>
>
>Would it be possible to get this fix in rc7? I am consistently seeing
>this problem with Intel MPI on a 64 node cluster.
>
>-arlin
>
>
Aviram? Is there an rc7 and could this get in?
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>
>
More information about the general
mailing list