[ofa-general] [PATCH][RFC] IPOIB/CM increase retry counts

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Fri Dec 21 13:25:54 PST 2007


I have seen sporadic errors while running the HCAs in connected mode.
These errors appear to be related to the speeds of the different HCAs.
Increasing the retry counts solves the problem.

I looked at the RFC as regards to warnings about retries. The warnings 
is to make sure that the IB timeouts do not interfere with TCP timeouts.
The TCP timeout are so much larger than the IB timeouts (even with 
non zero values) that we are nowhere close to interfering with TCP
timeouts.

Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
---

--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c	2007-12-21 16:06:49.000000000 -0500
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c	2007-12-21 16:07:28.000000000 -0500
@@ -990,8 +990,8 @@ static int ipoib_cm_send_req(struct net_
 	req.responder_resources		= 4;
 	req.remote_cm_response_timeout	= 20;
 	req.local_cm_response_timeout	= 20;
-	req.retry_count			= 0; /* RFC draft warns against retries */
-	req.rnr_retry_count		= 0; /* RFC draft warns against retries */
+	req.retry_count			= 3;
+	req.rnr_retry_count		= 3;
 	req.max_cm_retries		= 15;
 	req.srq				= ipoib_cm_has_srq(dev);
 	return ib_send_cm_req(id, &req);




More information about the general mailing list