[ofa-general] [PATCH 2/2] RDMA/nes: fix nes_nic_cm_xmit() error handling

Faisal Latif faisal.latif at intel.com
Mon Apr 6 12:28:52 PDT 2009


We are getting crash or hung situation when we are running network cable
pull tests during RDMA traffic.

In schedule_nes_timer(), we are returning error if nes_nic_cm_xmit()
returns failure. This is changed to success as skb is being put on the
timer routines to be processed later. In send_syn() case, we are
indicating connect failure once from nes_connect() and the other when
the rexmit retries expires.

The other issue is skb->users which we are incrementing before calling
nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure we
are decrementing the skb->users at the same time putting the skb on the
rexmit path. Even if dev_queue_xmit() fails, the skb->users is decremented
already. We are removing the decrement of skb->users in case of failure
from both schedule_nes_timer() as well as from nes_cm_timer_tick().

There is also extra check in nes_cm_timer_tick() for rexmit failure which
does a break from the loop is removed. This causes problem as the other
nodes have their cm_node->ref_count incremented and are not processed.

Signed-off-by: Faisal Latif <faisal.latif at intel.com>
---
 drivers/infiniband/hw/nes/nes_cm.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index 572231c..ba07852 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -446,8 +446,8 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb,
 		if (ret != NETDEV_TX_OK) {
 			nes_debug(NES_DBG_CM, "Error sending packet %p "
 				"(jiffies = %lu)\n", new_send, jiffies);
-			atomic_dec(&new_send->skb->users);
 			new_send->timetosend = jiffies;
+			ret = NETDEV_TX_OK;
 		} else {
 			cm_packets_sent++;
 			if (!send_retrans) {
@@ -631,7 +631,6 @@ static void nes_cm_timer_tick(unsigned long pass)
 				nes_debug(NES_DBG_CM, "rexmit failed for "
 					"node=%p\n", cm_node);
 				cm_packets_bounced++;
-				atomic_dec(&send_entry->skb->users);
 				send_entry->retrycount--;
 				nexttimeout = jiffies + NES_SHORT_TIME;
 				settimer = 1;
@@ -667,11 +666,6 @@ static void nes_cm_timer_tick(unsigned long pass)
 
 		spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags);
 		rem_ref_cm_node(cm_node->cm_core, cm_node);
-		if (ret != NETDEV_TX_OK) {
-			nes_debug(NES_DBG_CM, "rexmit failed for cm_node=%p\n",
-				cm_node);
-			break;
-		}
 	}
 
 	if (settimer) {
-- 
1.5.3.3




More information about the general mailing list