[ofa-general] IPOIB/CM increase retry counts

Or Gerlitz ogerlitz at voltaire.com
Tue Feb 12 00:08:08 PST 2008


Pradeep Satyanarayana wrote:
> I have seen sporadic errors while running the HCAs in connected mode.
> These errors appear to be related to the speeds of the different HCAs.
> Increasing the retry counts solves the problem.

Hi Predeep,

I see now that you have sent tonight this patch (posted on Dec 2007 to 
the mailing list and never discussed) to be included in ofed 1.3

I think more detailed are needed here on the problem, from the above 
three lines it seem to be more of a workaround than a solution. What is 
the problem here?

> I looked at the RFC as regards to warnings about retries. The warnings 
> is to make sure that the IB timeouts do not interfere with TCP timeouts.
> The TCP timeout are so much larger than the IB timeouts (even with 
> non zero values) that we are nowhere close to interfering with TCP
> timeouts.

IP provides "unreliable datagram service" to upper layers, hence don't 
really see a point in implementing it over a reliable HW transport. This 
was discussed on the list, and suggestions on how to move to IPoIB/CM 
over UC transports were made, not yet an implementation...

Saying all that, I don't think we want to have --any RNR retries--, as 
for retries, I am open to hear what others think.

Or.

> 
> Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
> ---
> 
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c	2007-12-21 16:06:49.000000000 -0500
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c	2007-12-21 16:07:28.000000000 -0500
> @@ -990,8 +990,8 @@ static int ipoib_cm_send_req(struct net_
>  	req.responder_resources		= 4;
>  	req.remote_cm_response_timeout	= 20;
>  	req.local_cm_response_timeout	= 20;
> -	req.retry_count			= 0; /* RFC draft warns against retries */
> -	req.rnr_retry_count		= 0; /* RFC draft warns against retries */
> +	req.retry_count			= 3;
> +	req.rnr_retry_count		= 3;
>  	req.max_cm_retries		= 15;
>  	req.srq				= ipoib_cm_has_srq(dev);
>  	return ib_send_cm_req(id, &req);




More information about the general mailing list