[ewg] [Fwd: Re: [ofa-general] [PATCH] ipoib: Fix loss of connectivity after bonding failover on both sides]

Moni Shoua monis at Voltaire.COM
Tue Nov 18 03:41:14 PST 2008


Hi
The patch below in it's previous  version appears in OFED-1.4_3 and since.
( kernel_patches/fixes/ipoib_0480_fix_loss_of_connectivity_after_failover.patch )

Could you please take care to replace it with the new one in next OFED?

thanks
-------- Original Message --------
Subject: Re: [ofa-general] [PATCH] ipoib: Fix loss of connectivity after	bonding failover on both sides
Date: Tue, 18 Nov 2008 13:34:45 +0200
From: Moni Shoua <monis at Voltaire.COM>
To: Yossi Etigin <yosefe at Voltaire.COM>, Roland Dreier <rdreier at cisco.com>
CC: Olga Shern <olgas at voltaire.com>,	general list <general at lists.openfabrics.org>
References: <490B448C.5080306 at Voltaire.COM>

The patch assumes that the path query succeeds and therefore copies the HA from
the kernel neighbor structure to ipoib_neigh after path query is sent. If path query fails (e.g. 
request timeout) the next won't be triggered by finding that HA was updated in ipoib_strart_xmit().
This leads to a longer time that the destination node remains unaccessible.

The patch below is identical to Yossi's patch but without the copy of HA in neigh_add_path.


diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index fddded7..ec433bf 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -709,26 +709,26 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		neigh = *to_ipoib_neigh(skb->dst->neighbour);
 
-		if (neigh->ah)
-			if (unlikely((memcmp(&neigh->dgid.raw,
-					    skb->dst->neighbour->ha + 4,
-					    sizeof(union ib_gid))) ||
-					 (neigh->dev != dev))) {
-				spin_lock_irqsave(&priv->lock, flags);
-				/*
-				 * It's safe to call ipoib_put_ah() inside
-				 * priv->lock here, because we know that
-				 * path->ah will always hold one more reference,
-				 * so ipoib_put_ah() will never do more than
-				 * decrement the ref count.
-				 */
+		if (unlikely((memcmp(&neigh->dgid.raw,
+				skb->dst->neighbour->ha + 4,
+				sizeof(union ib_gid))) ||
+				(neigh->dev != dev))) {
+			spin_lock_irqsave(&priv->lock, flags);
+			/*
+			 * It's safe to call ipoib_put_ah() inside
+			 * priv->lock here, because we know that
+			 * path->ah will always hold one more reference,
+			 * so ipoib_put_ah() will never do more than
+			 * decrement the ref count.
+			 */
+			if (neigh->ah)
 				ipoib_put_ah(neigh->ah);
-				list_del(&neigh->list);
-				ipoib_neigh_free(dev, neigh);
-				spin_unlock_irqrestore(&priv->lock, flags);
-				ipoib_path_lookup(skb, dev);
-				return NETDEV_TX_OK;
-			}
+			list_del(&neigh->list);
+			ipoib_neigh_free(dev, neigh);
+			spin_unlock_irqrestore(&priv->lock, flags);
+			ipoib_path_lookup(skb, dev);
+			return NETDEV_TX_OK;
+		}
 
 		if (ipoib_cm_get(neigh)) {
 			if (ipoib_cm_up(neigh)) {
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the ewg mailing list