[openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

Aviram Gutman aviram at mellanox.co.il
Tue Sep 26 08:39:37 PDT 2006


 



-----Original Message-----
From: Arlin Davis [mailto:ardavis at ichips.intel.com] 
Sent: Monday, September 25, 2006 9:38 PM
To: Arlin Davis
Cc: Sean Hefty; openib-general at openib.org; Aviram Gutman
Subject: Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response
to unmatched DREQ

Arlin Davis wrote:

>Sean Hefty wrote:
>
>  
>
>>Currently a DREP is only sent in response to a DREQ if a connection 
>>has been found matching the DREQ, and it is in the proper state.  Once

>>a DREP is sent, the local connection moves into timewait.  Duplicate 
>>DREQs received while in this state result in re-sending the DREP.
>>
>>However, it's likely that the local connection will enter and exit 
>>timewait before the remote side times out a lost DREP and resends a
DREQ.
>>There are a couple possible solutions to this.  One is to increase how

>>long a connection remains in timewait, by multiplying its wait time by

>>max_cm_retries.  This can greatly increase the timewait state before a

>>QP can be re-used when CM messages are not lost.
>>
>>An alternative is to send a DREP in response to a DREQ, even if a 
>>local connection is not found, which is what this patch does.
>> 
>>
>>    
>>
>
>Would it be possible to get this fix in  rc7? I am consistently seeing 
>this problem with Intel MPI on a 64 node cluster.
>
>-arlin
>  
>
> Aviram? Is there an rc7 and could this get in?

>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general
>
>  
>

Yes, Michael Tsirkin add it.




More information about the general mailing list