[openfabrics-ewg] [PATCH] ipoib: fix address update handling (was Re: OFED 1.1 release - schedule and features)
Michael S. Tsirkin
mst at mellanox.co.il
Mon Jul 17 07:03:49 PDT 2006
Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: OFED 1.1 release - schedule and features
>
> > So if the link which ib0 maps to is DOWN you move the ib0 IPv4 address
> > to another device whose link is UP (eg ib1) and you somehow have ib1
> > send a gratuitous ARP?
>
> I think there may be a problem in the way IPoIB deals with gratuitous
> ARPs. Because if a neighbour structure is updated by the networking
> core, there's no way for IPoIB to know about that and update the
> associated IB path.
>
> Has anyone actually tried this failover approach?
>
> - R.
OK, we've seen the problem here - and here's a patch to fix it.
Seems to work fine here - I'll let it run for a day just to make sure.
How does it look?
---
The neighbour ha field may get updated without destroying the neighbour.
In this case, the ha field gets out of sync with the address handle stored
in ipoib_neigh->ah, with the result that the ah field would point to an
incorrect path, resulting in all packets being lost.
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 3f89f5e..474aa21 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -212,6 +212,7 @@ struct ipoib_path {
struct ipoib_neigh {
struct ipoib_ah *ah;
+ union ib_gid dgid;
struct sk_buff_head queue;
struct neighbour *neighbour;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 1c6ea1c..cf71d2a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -404,6 +404,8 @@ static void path_rec_completion(int stat
list_for_each_entry(neigh, &path->neigh_list, list) {
kref_get(&path->ah->ref);
neigh->ah = path->ah;
+ memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw,
+ sizeof(union ib_gid));
while ((skb = __skb_dequeue(&neigh->queue)))
__skb_queue_tail(&skqueue, skb);
@@ -510,6 +512,8 @@ static void neigh_add_path(struct sk_buf
if (path->ah) {
kref_get(&path->ah->ref);
neigh->ah = path->ah;
+ memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw,
+ sizeof(union ib_gid));
ipoib_send(dev, skb, path->ah,
be32_to_cpup((__be32 *) skb->dst->neighbour->ha));
@@ -633,6 +637,25 @@ static int ipoib_start_xmit(struct sk_bu
neigh = *to_ipoib_neigh(skb->dst->neighbour);
if (likely(neigh->ah)) {
+ if (unlikely(memcmp(&neigh->dgid.raw,
+ skb->dst->neighbour->ha + 4,
+ sizeof(union ib_gid)))) {
+ spin_lock(&priv->lock);
+ /*
+ * It's safe to call ipoib_put_ah() inside
+ * priv->lock here, because we know that
+ * path->ah will always hold one more reference,
+ * so ipoib_put_ah() will never do more than
+ * decrement the ref count.
+ */
+ ipoib_put_ah(neigh->ah);
+ list_del(&neigh->list);
+ ipoib_neigh_free(neigh);
+ spin_unlock(&priv->lock);
+ ipoib_path_lookup(skb, dev);
+ goto out;
+ }
+
ipoib_send(dev, skb, neigh->ah,
be32_to_cpup((__be32 *) skb->dst->neighbour->ha));
goto out;
--
MST
More information about the ewg
mailing list